The CINTIL-PropBank (Branco et al., 2012) is a corpus of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082...
CINTIL-QATreebank is a treebank composed of Portuguese sentences that can be used to support the development of Question Answering systems. This Treebank includes 111 declarative sentences from the pre-existing CINTIL-Treebank (see Branco et al. 2011) whose syntactic structure was manually transf...
LX-Abbreviations resource is a collection of abbreviations of different types from European Portuguese composed by 208 words. Each type of abbreviation is manually divided and annotated with grammatical categories, gender and number, and, finally, with the respective abbreviations.
The PTPARL Corpus contains approximately 975,806 running words of European Portuguese. It includes 1076 texts consisting of adapted transcriptions of the Portuguese parliament sessions, which were made available in 2004.
The SIMPLE Portuguese Lexicon is constituted by 10,438 entries semantically encoded, accordingly to the parole common encoding standards.
The CINTIL-TreeBank (Branco et al., 2011) is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are ...
The EUROPARL Corpus (subpart Portuguese-English of the parallel corpora), available at http://www.statmt.org/europarl/, was extracted from the proceedings of the European Parliament (Koehn, 2005). It contains transcriptions of sessions dating back from 1996 to 2011, in a total of approximately 58...