PAROLE Portuguese Lexicon

The resource is constituted by 20 thousand entries morpho-syntactically and syntactically encoded, accordingly to the parole common encoding standards.

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
Fundamental Portuguese

This resource includes a spoken Portuguese corpus - with aligned sound and orthographic transcription -, collected among sociolinguistically diverse speakers. It consists of recordings from informal conversations.

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
Spoken Portuguese - Geographical and Social Varieties

This resource includes a spoken Portuguese corpus exemplifying the Portuguese spoken in Portugal, Brazil, Angola, Cape Verde, Guinea-Bissau, Mozambique, Sao Tome and Principe, Macao, Goa and East-Timor - with aligned sound and orthographic transcription - collected among sociolinguistically diver...

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
PAROLE Portuguese Annotated Corpus

The PAROLE Portuguese Corpus – tagged subset contains 250.000 tokens and is a subset of the PAROLE Portuguese Corpus of 3 million running words of European Portuguese. The corpus was classified and encoded according to the common core parole encoding standard. The tagged subset reproduces appro...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Ontology for the area of Nanoscience and Nanotechnology

The Ontology for the area of Nanoscience and Nanotechnology (Ontologia para a área de Nanociência e Nanotecnologia) is constituted by 511 terms of this field of knowledge. It was extracted from a corpus collected from the Web, with a total of 2.570.792 words

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
LX-Stopwords

LX-Stopwords resource is a manual list of words from Portuguese composed by 2631 words of 51 types. The words are grouped in three big classes, arranged according to their morpho-syntactic category and inflectional feature value (closed classes, open classes, and multi-word units). This list was ...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
CINTIL-DependencyBank

The CINTIL-DepBank (Branco et al., 2011a) is a corpus of grammatical dependencies of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens) (see 3.2.). In addition, the...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
SENTER

SENTER is a SENtence splitTER for Portuguese.

Resource Type:Tool / Service
Language:Portuguese
Lemmatizer for Portuguese

Based on the MXPOST part of speech tagger and UNITEX dictionaries for Portuguese, this tool produces the lemmas of the words of a text stored in a plain text file. The source code is also provided.

Resource Type:Tool / Service
Language:Portuguese
CSTParser

CSTParser is a multi-document discourse parser. Based on machine learning techniques and hand-crafted rules, the system identifies a set of relations predicted by CST (Cross-document Structure Theory) among sentences of different texts on the same topic.

Resource Type:Tool / Service
Language:Portuguese

Order by:

Filter by: