Arquivo Dialetal CLUP - POS is a speech corpus with approximately 40 000 tokens (Utterances; spontaneous speech, mainly from Northern Portugal). Orthographic transcription, POS.
Os documentos em português da Chancelaria de D. Afonso III constituem o primeiro conjunto significativo de textos em português (34 documentos que recobrem um período de 24 anos: 1255 - 1279), sendo apenas a partir de 1279, com D. Dinis (1261-1325), que se inicia o uso sistemático do português co...
Based on the MXPOST part of speech tagger and UNITEX dictionaries for Portuguese, this tool produces the lemmas of the words of a text stored in a plain text file. The source code is also provided.
DVPM-EtyMor is a lexical database. Etymological, morphological and textual exemplification. Around 3000 verbs. Language: Medieval portuguese.
The DeepBankPT (Branco et. al. 2010) is a corpus of semantic dependencies of translated texts composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal. The DeepBankPT is composed of MRS and AVM representations, derivation tree, and syntactic tree with grammatical and se...
DVPM-SynSem is a lexical database with syntactic and semantic information in Medieval Portuguese. It contains around 3000 verbs.
Porlex (Gomes & Castro, 2003) is a lexical database that includes written and phonetic transcription of standard adult vocabulary - 44 psycholinguistic characteristics (e.g. orthographic, phonological, phonetic, part-of-speech, and neighborhood characteristics). For each word it contains psychol...
The HESITA database is a corpus consisting of television daily news collected over a month and was annotated regarding to hesitation events, acoustical environments, speaking styles, speaker characteristics and respiratory events, among other characteristic sounds.
Hesita-POS is an annotaded corpus. Tv News.
A collection of language resources for the evaluation of distributional semantic models of Portuguese: LX-SimLex-999: http://metashare.metanet4u.eu/go2/lx-simlex-999 LX-Rare Word Similarity Data set: http://metashare.metanet4u.eu/go2/lx-rare-word-similarity-dataset LX-WordSim-353: h...