CINTIL-TreeBank

The CINTIL-TreeBank (Branco et al., 2011) is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-PropBank

The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 to...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-Definitions

The corpus presented here is a collection of several tutorials and scientific papers in the field of Information Technology with 603 annotated definitions from Portuguese. The texts were collected from the Web at the beginning of the 2006 and they are organised in 32 files of three different sub-...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
C-ORAL-ROM_EXM

This resource includes a spoken corpus with approximately 300.000 words, covering both formal (152.755 words) and informal (165.838 words) speech, with aligned sound and orthographic transcription and POS-tag information.

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
Fundamental Portuguese

This resource includes a spoken Portuguese corpus - with aligned sound and orthographic transcription -, collected among sociolinguistically diverse speakers. It consists of recordings from informal conversations.

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
PAROLE Portuguese Annotated Corpus

The PAROLE Portuguese Corpus – tagged subset contains 250.000 tokens and is a subset of the PAROLE Portuguese Corpus of 3 million running words of European Portuguese. The corpus was classified and encoded according to the common core parole encoding standard. The tagged subset reproduces appro...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Porlex

Porlex v2 (see Gomes & Castro, 2003) is a computerized lexical database in European Portuguese containing psycholinguistic and cognitive information that is useful to select stimulus materials for experiments and/or training vocabularies. It was built on the basis of a middle-sized adult lexicon,...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-QATreeBank

CINTIL-QATreebank is a treebank composed of Portuguese sentences that can be used to support the development of Question Answering systems. This Treebank includes 111 declarative sentences from the pre-existing CINTIL-Treebank (see Branco et al. 2011) whose syntactic structure was manually transf...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-LogicalFormBank

The CINTIL-LogicalFormBank (Branco, 2009, and Branco et al., 2011) is a corpus of semantic dependencies of sentences from Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-Corpus Internacional do Português

CINTIL-Corpus Internacional do Português is a linguistically interpreted corpus of Portuguese. At present it is composed of 1 Million annotated tokens, verified by human expert annotators. The annotation comprises information on part-of-speech, open classes lemma and inflection, multi-word expres...

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by: