MARv-POS is a part-of-speech tagger tool (probabilistic POS annotation module). MARv4's architecture comprehends two submodules: a set of linguistically-oriented disambiguation rules module and a probabilistic disambiguation module. The linguistic-oriented is no longer used in the STRING chain be...
Corpus with the transcriptions of syllogistic reasoning protocols. Written transcriptions: Verbal data (30 hours) elicited during an experiment on syllogistic reasoning (each of 27 participants x the 64 syllogistic problems): Thinking aloud task; reflexive conversation Performance data: La...
RudriCo-POS is a part-of-speech disambiguation tool that performs 188 morphological disambiguation rules.
RudriCo-TOK is a tokenizer tool that splits contractions. De-contraction rules: 178.
Royal inquiries of 1258 (primarily published in the Portugaliae Monumenta Historica).
A corpus of opinion articles annotated with arguments, following a claim-premise model.
LX-AP was created from the translation of Almuhareb-Poesio (ap) benchmark (Almuhareb and Poesio, 2005). The original data set was created considering three aspects: POS, frequency and ambiguity. It contains 402 names from 21 categories of WordNet, with 13 to 21 names from each one of those categ...
A wordnet is a lexical database. It groups synonymous words into sets, the synsets, which represent distinct concepts. These synsets form nodes in a network, which are interlinked through edges that correspond to semantic relations between those synsets. For instance, the hypernym relation, also ...
Porlex (Gomes & Castro, 2003) is a lexical database that includes written and phonetic transcription of standard adult vocabulary - 44 psycholinguistic characteristics (e.g. orthographic, phonological, phonetic, part-of-speech, and neighborhood characteristics). For each word it contains psychol...
This resource contains a pre-trained BERT language model trained on the Portuguese language. A BERT-Base cased variant was trained on the BrWaC (Brazilian Web as Corpus), a large Portuguese corpus, for 1,000,000 steps, using whole-word mask. The model is available as artifacts for TensorFlow and...