EMOTAIX.PT (Costa, 2012) is a database of 3,983 emotional words (nouns, verbs, adjectives and adverbs) in European Portuguese based on the original EMOTAIX in French (Piolat & Bannour, 2009). Each word is classified into three hierarchical levels: Supra Category, Super Category and Basic Category...
EmoVoicePort, Emotional Vocalization Corpus (see Lima, Castro, & Scott, 2013) is a validated set of nonverbal vocalizations that portray four positive emotions (achievement/triumph, amusement, sensual pleasure, relief) and four negative ones (anger, disgust, fear, sadness). The vocalizations (n =...
The lexicon of discourse markers for European Portuguese contains 252 pairs of discourse marker/rhetorical sense. The lexicon covers conjunctions, prepositions, adverbs, adverbial phrases and alternative lexicalizations with a connective function, as in the PDTB (Prasad et al., 2008; Prasad et al...
DVPM-EtyMor is a lexical database. Etymological, morphological and textual exemplification. Around 3000 verbs. Language: Medieval portuguese.
DVPM-SynSem is a lexical database with syntactic and semantic information in Medieval Portuguese. It contains around 3000 verbs.
Arquivo Dialetal CLUP - POS is a speech corpus with approximately 40 000 tokens (Utterances; spontaneous speech, mainly from Northern Portugal). Orthographic transcription, POS.
CINTIL-UDep is a dependency bank of Portuguese with 38,400 sentences (and nearly 476,000 tokens), that is treebanked with Universal Dependencies (UD). This version of CINTIL-UDep supersedes the one included in the v2.11 (2022-11-15) release of the Universal Dependencies (https://universaldepende...
The test set described in was used as the basis for the assessment of word embeddings. An example entry in this data set would read: ‘Berlin Germany Lisbon Portugal’. With these four words relations – as in this example – one can test semantic analogies by using any of the possible combinations o...
Porlex (Gomes & Castro, 2003) is a lexical database that includes written and phonetic transcription of standard adult vocabulary - 44 psycholinguistic characteristics (e.g. orthographic, phonological, phonetic, part-of-speech, and neighborhood characteristics). For each word it contains psychol...
The CINTIL-WordSenses corpus, built upon the CINTIL International Corpus of Portuguese (Barreto et al., 2006), is composed of 23,825 sentences of written Portuguese with open-class terms manually disambiguated and annotated with synset identifiers from the Portuguese MultiWordNet (MWNPT) (Pianti ...