EUROPARL Corpus Parallel Corpora: Portuguese-English

The EUROPARL Corpus (subpart Portuguese-English of the parallel corpora), available at http://www.statmt.org/europarl/, was extracted from the proceedings of the European Parliament (Koehn, 2005). It contains transcriptions of sessions dating back from 1996 to 2011, in a total of approximately 58...

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
LEX-MWE-PT: Word Combination in Portuguese Language

This lexicon includes multiword expressions (MWE) of European Portuguese extracted from a balanced 50,8M word written corpus – a subcorpus of the Reference Corpus of Contemporary Portuguese (CRPC). This corpus covers different genres, being mainly constituted by journalistic texts (59%), but it a...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
C-ORAL-ROM_EXM

This resource includes a spoken corpus with approximately 300.000 words, covering both formal (152.755 words) and informal (165.838 words) speech, with aligned sound and orthographic transcription and POS-tag information.

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
Simple Portuguese Lexicon

The SIMPLE Portuguese Lexicon is constituted by 10.438 entries semantically encoded, accordingly to the parole common encoding standards.

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
Fundamental Portuguese

This resource includes a spoken Portuguese corpus - with aligned sound and orthographic transcription -, collected among sociolinguistically diverse speakers. It consists of recordings from informal conversations.

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
Spoken Portuguese - Geographical and Social Varieties

This resource includes a spoken Portuguese corpus exemplifying the Portuguese spoken in Portugal, Brazil, Angola, Cape Verde, Guinea-Bissau, Mozambique, Sao Tome and Principe, Macao, Goa and East-Timor - with aligned sound and orthographic transcription - collected among sociolinguistically diver...

Resource Type:Corpus
Media Types:Text
Audio
Language:Portuguese
PAROLE Portuguese Annotated Corpus

The PAROLE Portuguese Corpus – tagged subset contains 250.000 tokens and is a subset of the PAROLE Portuguese Corpus of 3 million running words of European Portuguese. The corpus was classified and encoded according to the common core parole encoding standard. The tagged subset reproduces appro...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Ontology for the area of Nanoscience and Nanotechnology

The Ontology for the area of Nanoscience and Nanotechnology (Ontologia para a área de Nanociência e Nanotecnologia) is constituted by 511 terms of this field of knowledge. It was extracted from a corpus collected from the Web, with a total of 2.570.792 words

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
CINTIL-Corpus Internacional do Português

CINTIL-Corpus Internacional do Português is a linguistically interpreted corpus of Portuguese. At present it is composed of 1 Million annotated tokens, verified by human expert annotators. The annotation comprises information on part-of-speech, open classes lemma and inflection, multi-word expres...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
PTPARL Corpus

The PTPARL Corpus contains approximately 975,806 running words of European Portuguese. It includes 1076 texts consisting of adapted transcriptions of the Portuguese parliament sessions, which were made available in 2004.

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by: