Corpus Desenvolvimento da Escrita no Ensino Básico

The DEEB Corpus contains the transcriptions of 1200 narrative texts written by pupils in their 4th, 6th and 9th year Portuguese Language exams in the public school system in Portugal.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
XGLUE benchmark dataset

XGLUE is a new benchmark dataset to evaluate the performance of cross-lingual pre-trained models with respect to cross-lingual natural language understanding and generation. XGLUE is composed of 11 tasks spans 19 languages. For each task, the training data is only available in English. This me...

Resource Type:Corpus
Media Type:Text
Languages:Arabic
Bulgarian
Chinese
Dutch; Flemish
English
French
German
Greek, Modern (1453-)
Hindi
Italian
Polish
Portuguese
Russian
Spanish; Castilian
Swahili
Thai
Turkish
Urdu
Vietnamese
CRPC Discourse Bank v1.0

The CRPC Discourse Bank is labeled for discourse relations (also referred to as rhetorical relations or coher- ence relations), such as cause and condition, that hold between two spans of text and contribute to ensure the overall cohesion and coherence of the text. The scheme follows the principl...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
BERTimbau - Portuguese BERT-Base language model

This resource contains a pre-trained BERT language model trained on the Portuguese language. A BERT-Base cased variant was trained on the BrWaC (Brazilian Web as Corpus), a large Portuguese corpus, for 1,000,000 steps, using whole-word mask. The model is available as artifacts for TensorFlow and...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
BERTimbau - Portuguese BERT-Large language model

This resource contains a pre-trained BERT language model trained on the Portuguese language. A BERT-Large cased variant was trained on the BrWaC (Brazilian Web as Corpus), a large Portuguese corpus, for 1,000,000 steps, using whole-word mask. The model is available as artifacts for TensorFlow an...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
TimeBankPT

TimeBankPT, a TimeML annotated corpus of Portuguese, is the first corpus of Portuguese with rich temporal annotations (i.e. it includes annotations not only of temporal expressions but also about events and temporal relations). The annotation scheme used is similar to TimeML. TimeBankPT is the...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-QATreeBank

CINTIL-QATreebank is a treebank composed of Portuguese sentences that can be used to support the development of Question Answering systems. This Treebank includes 111 declarative sentences from the pre-existing CINTIL-Treebank (see Branco et al. 2011) whose syntactic structure was manually transf...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-DeepBank

CINTIL-DeepBank (Branco et al., 2010) is a corpus of Portuguese texts annotated with deep grammatical information. This document refers to version 1.4 of the corpus, from January 2016, which adds over 15,400 annotated sentences to the previous version from September 2015. The current version i...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
COVID-19 ANTIBIOTIC dataset. Bilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from the website https://antibiotic.ecdc.europa.eu/

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
COVID-19 ANTIBIOTIC dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from the website https://antibiotic.ecdc.europa.eu/ . It contains 20981 TUs (in total) for EN-X language pairs, where X is a CEF language.

Resource Type:Corpus
Media Type:Text
Languages:Bokmål, Norwegian; Norwegian Bokmål
Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Icelandic
Irish
Italian
Latvian
Lithuanian
Maltese
Moldavian; Moldovan
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish

Order by:

Filter by:

Portuguese (193)
English (50)
German (20)
French (19)
Czech (17)
Italian (17)
Basque (14)
Bulgarian (14)
Slovak (8)
Polish (7)
Danish (6)
Finnish (6)
Irish (6)
Latvian (6)
Maltese (6)
Swedish (6)
Catalan (3)
Chinese (3)
Spanish (3)
Arabic (2)
Latin (2)
Bosnian (1)
Hindi (1)
Russian (1)
Serbian (1)
Swahili (1)
Thai (1)
Turkish (1)
Urdu (1)