CINTIL-Corpus Internacional do Português

CINTIL-Corpus Internacional do Português is a linguistically interpreted corpus of Portuguese. At present it is composed of 1 Million annotated tokens, verified by human expert annotators. The annotation comprises information on part-of-speech, open classes lemma and inflection, multi-word expres...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CORDIAL-SIN – Syntax-oriented Corpus of Portuguese Dialects – TreeBank

The CORDIAL-SIN–TreeBank is a collection of 177596 syntactic parse trees of the Syntax-oriented Corpus of Portuguese Dialects. CORDIAL-SIN is a corpus of spoken dialectal European Portuguese, developed at Centro de Linguística da Universidade de Lisboa, that compiles excerpts of spontaneous and s...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
BDCamões Corpus - Collection of Portuguese Literary Documents from the Digital Library of Camões I.P. (Part I)

BDCamões Corpus is a collection of literary documents written in Portuguese, in plain text .txt format, with close to 4 million words from over 200 complete documents from 83 authors in 14 genres, covering a time span from the 15th to the 21st century, and adhering to different orthographic conve...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Hesita-POS

Hesita-POS is an annotaded corpus. Tv News.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
FEUP news corpus

News articles collected from Portuguese newspapers.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Spoken Corpus Mozambique

The Spoken Corpus Mozambique contains approximately 121,958 running words of spoken Portuguese from Mozambique. It includes 40 transcriptions of spoken recordings (in a total of 40 hours of recordings) that were recorded between 1986 and 1987.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
ViPER verb lexical database

ViPER is a verb lexical database with +7,000 verb senses, along with their structural, distributional, and transformational properties. The verb senses are classified based on the main syntactic properties of their construction. Around 70 formal classes have been devised. For each verb sense, its...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
Lexicon of discourse markers for European Portuguese

The lexicon of discourse markers for European Portuguese contains 252 pairs of discourse marker/rhetorical sense. The lexicon covers conjunctions, prepositions, adverbs, adverbial phrases and alternative lexicalizations with a connective function, as in the PDTB (Prasad et al., 2008; Prasad et al...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
Portulex

Portulex is a lexical database in European Portuguese that contains words from reading texts in children’s schoolbooks for reading and language instruction in Grades 1 to 4. It comprises a wordform and a lemma database. The wordform database consists of 17,062 inflected wordforms, and the lemma d...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
Thesaurus for Portuguese - version 2.0

TeP 2.0 is a wordnet-like semantic resource for the Brazilian Portuguese language. It includes the words of the language and the synonym and antonym relations that happen among them.

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)