Albertina PT-BR base is a foundation, large language model for American Portuguese from Brazil. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, with most competitive performance for this language. It is distributed free of...
Embeddings used in: Branco, António, João Rodrigues, Małgorzata Salawa, Ruben Branco and Chakaveh Saedi, 2020. Comparative Probing of Lexical Semantics Theories for Cognitive Plausibility and Technological Usefulness. In Proceedings of the International Conference on Computational Linguistics (C...
TeP 2.0 is a wordnet-like semantic resource for the Brazilian Portuguese language. It includes the words of the language and the synonym and antonym relations that happen among them.
DVPM-SynSem is a lexical database with syntactic and semantic information in Medieval Portuguese. It contains around 3000 verbs.
This set of materials pertains to a study on the production of explicit pronouns, null pronouns, and repeated-NP anaphors, in European Portuguese. A spreadsheet containing data from 73 participants (young adults), namely, count data for instances of the different types of anaphor that occurred in...
PicName (see Castro et al., 1997, 1999; Gomes et al., 2006; Neves et al., 1995) is a picture-naming task that can be used to collect spontaneous speech samples and to measure articulation abilities in Portuguese-speaking children. It is an updated version of the Sounds-in-Words task included in t...
This is a data set of Portuguese tweets labeled with the emotion conveyed in the tweet. It was gathered using a methodology similar to the one used for building the Affect in Tweets data set used in the SemEval-2018 Task 1. The data set contains 11219 tweets, each labeled with an emotion (anger,...
CINTIL DependencyBank PREMIUM is a corpus of Portuguese utterances manually annotated with the representation of grammatical dependency relations and the information of part-of-speech, inflection and lemmas. It is being developed and maintained at the University of Lisbon. The current version is ...
Lince is a multi-platform stand-alone application that updates the textual contents of documents in a range of popular formats to the spelling prescribed by the 1990 Portuguese language reform. It works with both previously existing Portuguese language orthographic standards (1943, previously val...
EmoVoicePort, Emotional Vocalization Corpus (see Lima, Castro, & Scott, 2013) is a validated set of nonverbal vocalizations that portray four positive emotions (achievement/triumph, amusement, sensual pleasure, relief) and four negative ones (anger, disgust, fear, sadness). The vocalizations (n =...