CINTIL-UPos

CINTIL-UPos is a corpus of Portuguese that is annotated with the Universal Part-of-Speech tagset (UPOS), related to the Universal Dependency framework, and that contains around 1 million annotated tokens. It is described in this article: António Branco, João Ricardo Silva, Luís Gomes and Jo...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Grafone-Tool

Grafone-Tool is a tool for conversion from grapheme to phoneme for European Portuguese. The converter works with the Portuguese spelling, both prior to and after the Orthographic Agreement of 1990.

Resource Type:Tool / Service
Language:Portuguese
CORDIAL-SIN – Syntax-oriented Corpus of Portuguese Dialects

CORDIAL-SIN is a corpus of spoken dialectal European Portuguese developed at Centro de Linguística da Universidade de Lisboa (CLUL). The materials for this corpus were drawn from the recordings of dialect speech collected by the CLUL ATLAS team as fieldwork interviews for linguistic atlases betwe...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Biographies of Portuguese People

This is a set of 11.361 biographies of Portuguese people. The compilation of the data involved the biography collection from wikipedia and data conversion. Several filters were applied to remove entries that were mostly empty or non applicable content. Format: JSON (conversion from HTML) ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Embeddings for Comparative Probing of Lexical Semantics Theories

Embeddings used in: Branco, António, João Rodrigues, Małgorzata Salawa, Ruben Branco and Chakaveh Saedi, 2020. Comparative Probing of Lexical Semantics Theories for Cognitive Plausibility and Technological Usefulness. In Proceedings of the International Conference on Computational Linguistics (C...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
LX-SimLex-999

The LX-SimLex-999 was created from SimLex-999 (Hill et al., 2015) which, in turn, was based in the University of South Florida Free Association Database (USF) (Nelson et al., 2014). There were strict guidelines to create SimLex-999. Both words in each pair have the same morphosyntactic category ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
LX-AP

LX-AP was created from the translation of Almuhareb-Poesio (ap) benchmark (Almuhareb and Poesio, 2005). The original data set was created considering three aspects: POS, frequency and ambiguity. It contains 402 names from 21 categories of WordNet, with 13 to 21 names from each one of those categ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-NamedEntities

The CINTIL-NamedEntities corpus, built upon the CINTIL International Corpus of Portuguese (Barreto et al., 2006), is composed of 30,493 sentences of written Portuguese with named entities manually disambiguated and annotated with links to appropriate pages in the Portuguese Dbpedia (Lehmann et al...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
LX-LR4DistSemEval

A collection of language resources for the evaluation of distributional semantic models of Portuguese: LX-SimLex-999: http://metashare.metanet4u.eu/go2/lx-simlex-999 LX-Rare Word Similarity Data set: http://metashare.metanet4u.eu/go2/lx-rare-word-similarity-dataset LX-WordSim-353: h...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
PsychAnaphora - Types of anaphora produced in a sentence completion task

This set of materials pertains to a study on the production of explicit pronouns, null pronouns, and repeated-NP anaphors, in European Portuguese. A spreadsheet containing data from 73 participants (young adults), namely, count data for instances of the different types of anaphor that occurred in...

Resource Type:Language Description
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (444)
Audio (18)
Image (1)