This resource includes a spoken Portuguese corpus - with aligned sound and orthographic transcription -, collected among sociolinguistically diverse speakers. It consists of recordings from informal conversations.
This resource includes a spoken corpus with approximately 300.000 words, covering both formal (152.755 words) and informal (165.838 words) speech, with aligned sound and orthographic transcription and POS-tag information.
The resource is constituted by 20 thousand entries morpho-syntactically and syntactically encoded, accordingly to the parole common encoding standards.
The CINTIL-PropBank (Branco et al., 2012) is a corpus of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082...
The LT Corpus (Literary Corpus) contains approximately 1,781,083 running words of European and Brazilian Portuguese. It includes 70 copyright-free classics (61 Portugal and 9 from Brazil) published before 1940.
This lexicon includes multiword expressions (MWE) of European Portuguese extracted from a balanced 50,8M word written corpus – a subcorpus of the Reference Corpus of Contemporary Portuguese (CRPC). This corpus covers different genres, being mainly constituted by journalistic texts (59%), but it a...
The corpus presented here is a collection of several tutorials and scientific papers in the field of Information Technology with 603 annotated definitions from Portuguese. The texts were collected from the Web at the beginning of the 2006 and they are organised in 32 files of three different sub-...
Geo-Net-PT 02 is a public Geospatial Ontology of Portugal (see Chaves et al., 2007), a computational resource (see Rodrigues et al., 2006 and Rodrigues, 2009) for applications demanding geographic information about Portugal, and contains 701,209 concepts stored in a GKB system, most of them admin...
LX-Abbreviations resource is a collection of abbreviations of different types from European Portuguese composed by 208 words. Each type of abbreviation is manually divided and annotated with grammatical categories, gender and number, and, finally, with the respective abbreviations.
LX-Stopwords resource is a manual list of words from Portuguese composed by 2631 words of 51 types. The words are grouped in three big classes, arranged according to their morpho-syntactic category and inflectional feature value (closed classes, open classes, and multi-word units). This list was ...