LX-Tagger

LX-Tagger is a freely available online service for the part-of-speech tagging of Portuguese. It was developed and is mantained by the NLX-Natural Language and Speech Group at the University of Lisbon, Department of Informatics. The service is composed by a set of shallow processing tools: A se...

Resource Type:Tool / Service
Language:Portuguese
CINTIL-UDep

CINTIL-UDep is a dependency bank of Portuguese with 38,400 sentences (and nearly 476,000 tokens), that is treebanked with Universal Dependencies (UD). This version of CINTIL-UDep supersedes the one included in the v2.11 (2022-11-15) release of the Universal Dependencies (https://universaldepende...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
MARv4

MARv-POS is a part-of-speech tagger tool (probabilistic POS annotation module). MARv4's architecture comprehends two submodules: a set of linguistically-oriented disambiguation rules module and a probabilistic disambiguation module. The linguistic-oriented is no longer used in the STRING chain be...

Resource Type:Tool / Service
Language:Portuguese
Biographies of Portuguese People

This is a set of 11.361 biographies of Portuguese people. The compilation of the data involved the biography collection from wikipedia and data conversion. Several filters were applied to remove entries that were mostly empty or non applicable content. Format: JSON (conversion from HTML) ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
FEUP news corpus

News articles collected from Portuguese newspapers.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
BDCamões DependencyBank (Part II)

BDCamões Corpus - Collection of Portuguese Literary Documents from the Digital Library of Camões I.P., is a collection of literary documents written in Portuguese, in plain text .txt format, with close to 4 million words from over 200 complete documents from 83 authors in 14 genres, covering a ti...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
BDCamões DependencyBank (Part I)

BDCamões Corpus - Collection of Portuguese Literary Documents from the Digital Library of Camões I.P., is a collection of literary documents written in Portuguese, in plain text .txt format, with close to 4 million words from over 200 complete documents from 83 authors in 14 genres, covering a ti...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Corpus Desenvolvimento da Escrita no Ensino Básico

The DEEB Corpus contains the transcriptions of 1200 narrative texts written by pupils in their 4th, 6th and 9th year Portuguese Language exams in the public school system in Portugal.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CIPM-POS

CIPM-POS is a set of historical, religious, notarial, literary texts in prose and verse, written is medieval portuguese. It contains around 88000 words.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
LexMan-POSTagger

LexMan-POSTagger is a morphological analyser tool that morphologically tags all words. Size: Lemmas verbs: 12 995; Lemmas nouns and adj: 38 180; Lemmas adverbs: 7 250; Compound words: 35 201. Language: Portuguese.

Resource Type:Tool / Service
Language:Portuguese

Order by:

Filter by:

Text (445)
Audio (18)
Image (1)