NPChunks

The NPChunks training corpus contains approximately 1,000 sentences, in a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus (Barreto et al, 2006). The CINTIL corpus is a linguistically interpreted corpus of Portuguese composed of 1 Million annotated tokens from ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CORDIAL-SIN – Syntax-oriented Corpus of Portuguese Dialects

CORDIAL-SIN is a corpus of spoken dialectal European Portuguese developed at Centro de Linguística da Universidade de Lisboa (CLUL). The materials for this corpus were drawn from the recordings of dialect speech collected by the CLUL ATLAS team as fieldwork interviews for linguistic atlases betwe...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Legal texts from Estonian Ministry of Justice (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Estonian-English translations of the Acts of Estonian la...

Resource Type:Corpus
Media Type:Text
Languages:English
Estonian
COVID-19 EUROPARL dataset v1. Bilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from the website (https://www.europarl.europa.eu/) of the European Parliament (25th April 2020)

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
COVID-19 Parallel Global Voices dataset. Bilingual (EN-PT)

EN-PT Bilingual COVID-19-related corpus acquired from the website (https://globalvoices.org/) of GlobalVoices (28th April 2020)

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
COVID-19 EU presscorner v2 dataset. Bilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020).

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
COVID-19 EC-EUROPA v1 dataset. Bilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from website (https://ec.europa.eu/*coronavirus-response) of the EU portal (20th May 2020).

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
COVID-19 ANTIBIOTIC dataset. Bilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from the website https://antibiotic.ecdc.europa.eu/

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
English-Norwegian parallel corpus from Forbruker Europa, 2017 release (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Forbruker Europa is the Norwegian office of the European...

Resource Type:Corpus
Media Type:Text
Languages:Bokmål, Norwegian; Norwegian Bokmål
English
Bilingual collection of documents about the Cyprus Problem (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A parallel corpus(Greek-English) regarding the Cyprus Pr...

Resource Type:Corpus
Media Type:Text
Languages:English
Greek, Modern (1453-)

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)