Maltese Acquis Communautaire

This is the Maltese version of the Acquis Communautaire (AC), which is the total body of European Union (EU) law applicable in the EU Member States. It consists of selected texts between the 1950s and today, translated to Maltese.

Resource Type:Corpus
Media Type:Text
Language:Maltese
COVID-19 EU presscorner v1 dataset. Bilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (14th May 2020).

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
Europarl-QTLeap WSD/NED corpus

The texts are sentences from the Europarl parallel corpus (Koehn, 2005). The textscontain the monolingual sentences from parallel corpora for the following pairs: Bulgarian-English, Czech-English, Portuguese-English and Spanish- English. The English corpus is comprised by the English side of th...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
English
Portuguese
Spanish; Castilian
News corpus categorised

The News corpus developed by LIACC in JSON format was complemented with POS and keyword topics annotation. POS-tagging =========== The POS-tagging used the tagger described in Généreux et al. (2012) The title and text body were extracted, tokenized and pos-tagged. Two new fields were added...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CINTIL-WordSenses

The CINTIL-WordSenses corpus, built upon the CINTIL International Corpus of Portuguese (Barreto et al., 2006), is composed of 23,825 sentences of written Portuguese with open-class terms manually disambiguated and annotated with synset identifiers from the Portuguese MultiWordNet (MWNPT) (Pianti ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
HiEve

A corpus of manually annotated event hierarchies in news stories.

Resource Type:Corpus
Media Type:Text
Language:English
Polish Ministry of Foreign Affairs Youth 2011 Report (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A parallel Polish-English version of the Youth 2011 repo...

Resource Type:Corpus
Media Type:Text
Languages:English
Polish
PKN Orlen Dataset (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Dataset of the Polish public sector company PKN Orlen, a...

Resource Type:Corpus
Media Type:Text
Languages:English
Polish
EUIPO - list of goods and services German and English (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. EUIPO list of goods and services format: TMX

Resource Type:Corpus
Media Type:Text
Languages:English
German
Bilingual collection of reports of the Greek Public Power Corporation (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A bilingual collection of translation units extracted fr...

Resource Type:Corpus
Media Type:Text
Languages:English
Greek, Modern (1453-)

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)