EmoProsodyPort (see Castro & Lima, 2010) is a speech database with 368 short sentences and pseudosentences with neutral emotional content. Acoustic measurements and behavioral data.
This lexicon includes multiword expressions (MWE) of European Portuguese extracted from a balanced 50,8M word written corpus – a subcorpus of the Reference Corpus of Contemporary Portuguese (CRPC). This corpus covers different genres, being mainly constituted by journalistic texts (59%), but it a...
The DepBankPT (Branco et al., 2011a) is a corpus of grammatical dependencies of the translated news composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal. The DepBankPT is aligned to a constituency bank, the TreeBankPT (see Branco et al., 2011b). The key bridging eleme...
The LogicalFormBankPT (Branco, 2009, and Branco et al., 2011) is a corpus of semantic dependencies of translated texts composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal. The LogicalFormBankPT is composed of MRS representations of each sentence’s semantic relation...
PS Corpus (Post-Scriptum)-PT is a corpus of 2215 informal mail letters written in Portuguese during the Modern Ages (from the XVIth century to the beginning of the XIXth century). Each letter is available as a semi-palaeographic transcription, a modernized transcription, and with part-of-speec...
CIPM is a set of historical, religious, notarial, literary texts in prose and verse, written in medieval portuguese. It has around 3.5 million words.
This is a data set of Portuguese tweets labeled with the emotion conveyed in the tweet. It was gathered using a methodology similar to the one used for building the Affect in Tweets data set used in the SemEval-2018 Task 1. The data set contains 11219 tweets, each labeled with an emotion (anger,...
This set of materials resulted from a study on the processing of explicit pronouns in European Portuguese. A spreadsheet containing data from 75 participants (young adults), namely, per-word reading times and accuracy data on comprehension questions, is provided. Complementary materials (Read Fir...
The corpus presented here is a collection of several tutorials and scientific papers in the field of Information Technology with 603 annotated definitions from Portuguese. The texts were collected from the Web at the beginning of the 2006 and they are organised in 32 files of three different sub-...
A publicação Arquivo dos Açores, consagrada como obra de referência para a investigação histórica sobre o arquipélago dos Açores, conta com duas séries, num total de 20 volumes. A primeira série do Arquivo dos Açores, composta por 15 volumes, decorreu entre 1878 e 1959, com grandes interrupções r...