CINTIL-Corpus Internacional do Português

CINTIL-Corpus Internacional do Português is a linguistically interpreted corpus of Portuguese. At present it is composed of 1 Million annotated tokens, verified by human expert annotators. The annotation comprises information on part-of-speech, open classes lemma and inflection, multi-word expres...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
PAROLE Portuguese Annotated Corpus

The PAROLE Portuguese Corpus – tagged subset contains 250.000 tokens and is a subset of the PAROLE Portuguese Corpus of 3 million running words of European Portuguese. The corpus was classified and encoded according to the common core parole encoding standard. The tagged subset reproduces appro...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Porlex

Porlex v2 (see Gomes & Castro, 2003) is a computerized lexical database in European Portuguese containing psycholinguistic and cognitive information that is useful to select stimulus materials for experiments and/or training vocabularies. It was built on the basis of a middle-sized adult lexicon,...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
LX-Stopwords

LX-Stopwords resource is a manual list of words from Portuguese composed by 2631 words of 51 types. The words are grouped in three big classes, arranged according to their morpho-syntactic category and inflectional feature value (closed classes, open classes, and multi-word units). This list was ...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
LT Corpus

The LT Corpus (Literary Corpus) contains approximately 1,781,083 running words of European and Brazilian Portuguese. It includes 70 copyright-free classics (61 Portugal and 9 from Brazil) published before 1940.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Geo-Net-PT 02

Geo-Net-PT 02 is a public Geospatial Ontology of Portugal (see Chaves et al., 2007), a computational resource (see Rodrigues et al., 2006 and Rodrigues, 2009) for applications demanding geographic information about Portugal, and contains 701,209 concepts stored in a GKB system, most of them admin...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
PropBankPT

The PropBankPT (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal translated. For the creation of this PropBank we adopted a semi-automatic analysis with...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
PAROLE Portuguese Lexicon

The resource is constituted by 20 thousand entries morpho-syntactically and syntactically encoded, accordingly to the parole common encoding standards.

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
TreeBankPT

The TreeBankPT (Branco et al., 2011) is a corpus of syntactic constituency trees of the translated news composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal. For the creation of this TreeBank we adopted a semi-automatic analysis with a double-blind annotation followed...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-PropBank

The CINTIL-PropBank (Branco et al., 2012) is a corpus of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082...

Resource Type:Corpus
Media Type:Text
Language:Portuguese