PS corpus (Post-Scriptum) - treebank

PS corpus (Post-Scriptum) - treebank is a treebank corpus of 586 informal mail letters written in Portuguese and Spanish during the Modern Ages (from the XVIth century to the beginning of the XIXth century). This treebank is a syntactically annotated subset of the Portuguese "PS corpus (Post-S...

Resource Type:Corpus
Media Type:Text
Languages:Portuguese
Spanish; Castilian
CRPC Discourse Bank v1.0

The CRPC Discourse Bank is labeled for discourse relations (also referred to as rhetorical relations or coher- ence relations), such as cause and condition, that hold between two spans of text and contribute to ensure the overall cohesion and coherence of the text. The scheme follows the principl...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CORDIAL-SIN – Syntax-oriented Corpus of Portuguese Dialects

CORDIAL-SIN is a corpus of spoken dialectal European Portuguese developed at Centro de Linguística da Universidade de Lisboa (CLUL). The materials for this corpus were drawn from the recordings of dialect speech collected by the CLUL ATLAS team as fieldwork interviews for linguistic atlases betwe...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
ArgMine Corpus

A corpus of opinion articles annotated with arguments, following a claim-premise model.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
BERTimbau - Portuguese BERT-Large language model

This resource contains a pre-trained BERT language model trained on the Portuguese language. A BERT-Large cased variant was trained on the BrWaC (Brazilian Web as Corpus), a large Portuguese corpus, for 1,000,000 steps, using whole-word mask. The model is available as artifacts for TensorFlow an...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
LX-Tagger

LX-Tagger is a freely available online service for the part-of-speech tagging of Portuguese. It was developed and is mantained by the NLX-Natural Language and Speech Group at the University of Lisbon, Department of Informatics. The service is composed by a set of shallow processing tools: A se...

Resource Type:Tool / Service
Language:Portuguese
Priberam Compressive Summarization Corpus

This is a corpus for multi-document summarization for European Portuguese. It contains 80 topics, each of which has 10 documents, for a total of 800 documents. Each topic contains two human summaries. The summaries are compressive: they are the result of a compression of the sentences in the orig...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Grafone-LEX

Grafone-LEX is a lexical database for conversion from graphemes to phonemes

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Portuguese (192)
English (50)
German (20)
French (19)
Czech (17)
Italian (17)
Basque (14)
Bulgarian (14)
Slovak (8)
Polish (7)
Danish (6)
Finnish (6)
Irish (6)
Latvian (6)
Maltese (6)
Swedish (6)
Catalan (3)
Chinese (3)
Spanish (3)
Arabic (2)
Latin (2)
Bosnian (1)
Hindi (1)
Russian (1)
Serbian (1)
Swahili (1)
Thai (1)
Turkish (1)
Urdu (1)