Europarl-QTLeap WSD/NED corpus

The texts are sentences from the Europarl parallel corpus (Koehn, 2005). The textscontain the monolingual sentences from parallel corpora for the following pairs: Bulgarian-English, Czech-English, Portuguese-English and Spanish- English. The English corpus is comprised by the English side of th...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
English
Portuguese
Spanish; Castilian
News-QTLeap WSD/NED corpus

The texts are sentences from the News parallel corpus. The texts contain monolingual sentences from parallel corpora for the following pairs: Basque-English, Bulgarian-English, Czech-English, Portuguese-English and Spanish-English. The English corpus is comprised by the English side of the Spanis...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
English
Portuguese
Spanish; Castilian
COVID-19 ANTIBIOTIC dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from the website https://antibiotic.ecdc.europa.eu/ . It contains 20981 TUs (in total) for EN-X language pairs, where X is a CEF language.

Resource Type:Corpus
Media Type:Text
Languages:Bokmål, Norwegian; Norwegian Bokmål
Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Icelandic
Irish
Italian
Latvian
Lithuanian
Maltese
Moldavian; Moldovan
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish
Parallel Global Voices (Bulgarian - English) (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel Global Voices BG-EN is a parallel corpus genera...

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
English
Bulgarian-English Wikipedia WSD/NED corpus

Bulgarian-English Wikipedia WSD/NED corpus is composed of articles from the Bulgarian version of Wikipedia and their English counterparts.

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
English
Radio Bulgaria WSD/NED corpus

Radio Bulgaria WSD/NED corpus is composed of texts from Bulgarian and English articles from the website of Radio Bulgaria.

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
English
Bilingual documents Bulgarian-English in the field of transport (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual Bulgarian-English collection of documents; 549...

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
English
COVID-19 EUR-LEX dataset . Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://eur-lex.europa.eu/legal-content) of the EU portal (9th July 2020). It contains 23 TMX files (EN-X, X is a CEF language) with 475,931 translation units pairs in total.

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Irish
Italian
Latvian
Lithuanian
Maltese
Moldavian; Moldovan
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish
QTLeap specialized lexicons

This resource is part of Deliverable 5.7 of the European Comission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). This gazetteer comprises multilingual lexicon entries used for the translation of specific IT domain expressions for Basque, Bulgarian, Czech, Dutch, Engli...

Resource Type:Lexical / Conceptual
Media Type:Text
Languages:Basque
Bulgarian
Czech
Dutch; Flemish
English
Portuguese
Spanish; Castilian