CIPM

CIPM is a set of historical, religious, notarial, literary texts in prose and verse, written in medieval portuguese. It has around 3.5 million words.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
ArgMine Corpus

A corpus of opinion articles annotated with arguments, following a claim-premise model.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
FEUP news corpus

News articles collected from Portuguese newspapers.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Porttinari – PORTuguese Treebank

Porttinari-base (Duran et al., 2023) is the journalistic portion of Porttinari (which stands for “PORTuguese Treebank”), which shall be a large multigenre treebank for Portuguese (Pardo et al., 2021), following the "Universal Dependencies" international grammar framework (de Marneffe et al., 2021...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Brands.Br – a Portuguese Reviews Corpus

The Brands.Br corpus was built from a fraction of B2W-Reviews01 corpus. We use a set of 252 samples selected by B2W to be enriched. In Brands.Br corpus we want to solve two main challenges in product reviews corpus. The first: it is very common to find customer reviews referring to distinct thing...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
ExtraGLUE

ExtraGLUE is a Portuguese dataset obtained by the automatic translation of some of the tasks in the GLUE and SuperGLUE benchmarks. Two variants of Portuguese are considered, namely European Portuguese and American Portuguese. The 14 tasks in extraGLUE cover different aspects of language unders...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
ExtraGLUE-instruct

ExtraGLUE-instruct is a data set with examples from tasks, with instructions and with prompts that integrate instructions and examples, for both the European variant of Portuguese, spoken in Portugal, and the American variant of Portuguese, spoken in Brazil. For each variant, it contains over 170...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
NPChunks

The NPChunks training corpus contains approximately 1,000 sentences, in a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus (Barreto et al, 2006). The CINTIL corpus is a linguistically interpreted corpus of Portuguese composed of 1 Million annotated tokens from ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Corpus de Produções Escritas de Aprendentes de PL2 (PEAPL2)

A Portuguese as a non-native language learners' corpus of written texts with three independent subcorpora: - Portuguese as a Foreign Language: Subcorpus Português Língua Estrangeira (PEAPL2_PLE) http://teitok2.iltec.pt/peapl2-ple/index.php?action=home - East Timorese Portuguese: Subcorpus T...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL DependencyBank PREMIUM

CINTIL DependencyBank PREMIUM is a corpus of Portuguese utterances manually annotated with the representation of grammatical dependency relations and the information of part-of-speech, inflection and lemmas. It is being developed and maintained at the University of Lisbon. The current version is ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

News (8)
Novels (6)
General (5)
Fiction (1)
Science (1)
Science (1)