The Complex Word (CW) Corpus contains 731 sentences each with one annotated CW. These simplifications were mined from Simple Wikipedia edit histories. Each entry gives an example of a sentence requiring simplification by means of a single lexical edit. This resource is primarily designed for t...
108 WAV files of spoken Maltese newspaper texts, subdivided into 12 directories with a variable number of sentences (sometimes: clauses) each. They come together with transcriptions and tables of phoneme durations.
The full editions of ILLUM from 12/11/2006 to 30/05/2010 (185 issues).
The DeepBankPT (Branco et. al. 2010) is a corpus of semantic dependencies of translated texts composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal. The DeepBankPT is composed of MRS and AVM representations, derivation tree, and syntactic tree with grammatical and se...
Description
The texts are sentences from the Europarl parallel corpus (Koehn, 2005). The textscontain the monolingual sentences from parallel corpora for the following pairs: Bulgarian-English, Czech-English, Portuguese-English and Spanish- English. The English corpus is comprised by the English side of th...
The HESITA database is a corpus consisting of television daily news collected over a month and was annotated regarding to hesitation events, acoustical environments, speaking styles, speaker characteristics and respiratory events, among other characteristic sounds.
Hesita-POS is an annotaded corpus. Tv News.
A collection of language resources for the evaluation of distributional semantic models of Portuguese: LX-SimLex-999: http://metashare.metanet4u.eu/go2/lx-simlex-999 LX-Rare Word Similarity Data set: http://metashare.metanet4u.eu/go2/lx-rare-word-similarity-dataset LX-WordSim-353: h...
CINTIL-UPos is a corpus of Portuguese that is annotated with the Universal Part-of-Speech tagset (UPOS), related to the Universal Dependency framework, and that contains around 1 million annotated tokens. It is described in this article: António Branco, João Ricardo Silva, Luís Gomes and Jo...