Hesita-POS is an annotaded corpus. Tv News.
CINTIL-USuite is a corpus of Portuguese that is annotated with lemmas, the Universal Part-of-Speech tagset (UPOS) and Universal feature bundles, related to the Universal Dependency framework, and that contains around 1 million annotated tokens. It is described in this article: António Branc...
The DEEB Corpus contains the transcriptions of 1200 narrative texts written by pupils in their 4th, 6th and 9th year Portuguese Language exams in the public school system in Portugal.
Corpus with the transcriptions of syllogistic reasoning protocols. Written transcriptions: Verbal data (30 hours) elicited during an experiment on syllogistic reasoning (each of 27 participants x the 64 syllogistic problems): Thinking aloud task; reflexive conversation Performance data: La...
News articles collected from Portuguese newspapers.
277780 sentence pairs (in 23 EN-X language pairs in total) extracted from the Publications Office of the EU on the medical domain. These are sourced from laws, studies, EC announcements, etc. labelled with concepts like epidemiology, epidemic, disease surveillance, health control, public hygiene,...
ExtraGLUE-instruct is a data set with examples from tasks, with instructions and with prompts that integrate instructions and examples, for both the European variant of Portuguese, spoken in Portugal, and the American variant of Portuguese, spoken in Brazil. For each variant, it contains over 170...