This research proposes a corpus of popular Brazilian Portuguese, called CorPop, with texts selected based on the average level of literacy of the country's readers. CorPop’s theoretical and methodological bases are interdisciplinary and fall within the scope of Language Studies and related discip...
Database with 2.253 citations extracted from the Corpus de Referência do Português Contemporâneo - CRPC (Reference Corpus of Contemporary Portuguese) and manually revised. Format: tab separated file Fields: - context number - source file id - citation
Porttinari-base (Duran et al., 2023) is the journalistic portion of Porttinari (which stands for “PORTuguese Treebank”), which shall be a large multigenre treebank for Portuguese (Pardo et al., 2021), following the "Universal Dependencies" international grammar framework (de Marneffe et al., 2021...
CORAA NURC-SP Minimal Corpus is a manually annotated corpus of Brazilian Portuguese spontaneous speech (São Paulo variety). The corpus is a subset of NURC (‘Cultured Linguistic Urban Norm’) project collection, one of the most influential in Brazilian Linguistics. The corpus was brought to digital...
The Brands.Br corpus was built from a fraction of B2W-Reviews01 corpus. We use a set of 252 samples selected by B2W to be enriched. In Brands.Br corpus we want to solve two main challenges in product reviews corpus. The first: it is very common to find customer reviews referring to distinct thing...
Royal inquiries of 1258 (primarily published in the Portugaliae Monumenta Historica).
Port-AoA Words (Cameirão & Vicente, 2010) is a lexical database containing 7 psycholinguistic characteristics (e.g. neighborhood density, written-word frequency, familiarity, imageability, etc). Standard adult vocabulary.
Grafone-LEX is a lexical database for conversion from graphemes to phonemes
PicName (see Castro et al., 1997, 1999; Gomes et al., 2006; Neves et al., 1995) is a picture-naming task that can be used to collect spontaneous speech samples and to measure articulation abilities in Portuguese-speaking children. It is an updated version of the Sounds-in-Words task included in t...
Dicionário de Gentílicos e Topónimos is a list of pairs of toponyms and demonyms. The toponyms and demonyms included have a morphologically compositional relation between each other. The list contains around 1500 such pairs and additionally provides information on the toponym referent (upper unit...