QTLeap specialized lexicons

This resource is part of Deliverable 5.7 of the European Comission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). This gazetteer comprises multilingual lexicon entries used for the translation of specific IT domain expressions for Basque, Bulgarian, Czech, Dutch, Engli...

Resource Type:Lexical / Conceptual
Media Type:Text
Languages:Basque
Bulgarian
Czech
Dutch; Flemish
English
Portuguese
Spanish; Castilian
COVID-19 EU presscorner v2 dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 151895 TUs in total.

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Irish
Italian
Latvian
Lithuanian
Maltese
Moldavian; Moldovan
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish
COVID-19 EU presscorner v1 dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (14th May 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 83217 TUs in total.

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Irish
Italian
Latvian
Lithuanian
Maltese
Moldavian; Moldovan
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish
Multilingual corpus from the Publications Office of the EU on the medical domain v.2

277780 sentence pairs (in 23 EN-X language pairs in total) extracted from the Publications Office of the EU on the medical domain. These are sourced from laws, studies, EC announcements, etc. labelled with concepts like epidemiology, epidemic, disease surveillance, health control, public hygiene,...

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Irish
Italian
Latvian
Lithuanian
Maltese
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish
COVID-19 EC-EUROPA v1 dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/*coronavirus-response) of the EU portal (20th May 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 53311 TUs in total.

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Irish
Italian
Latvian
Lithuanian
Maltese
Moldavian; Moldovan
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish
Parallel texts from Swedish Work environment Authority (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts from the Swedish Work Environment authori...

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
Czech
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Italian
Latvian
Lithuanian
Polish
Romanian
Spanish; Castilian
Swedish
QTLeap Corpus V1.2

The QTLeap corpus is composed by 4000 question and answer pairs in the domain of computer and IT troubleshooting for both hardware and software. This material was collected using a support service via chat, this implies that the corpus is composed by naturally occurring utterances produced by use...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
Dutch; Flemish
English
German
Portuguese
Spanish; Castilian
QTLeap News Corpus

This corpus is a sample extracted from the corpus made available by the annual workshops/conferences on Statistical Machine Translation (WMT, see \url{http://www.statmt.org/}) from the News domain. To this end, 1104 English sentences and their corresponding human translations into Czech, German a...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
Dutch; Flemish
English
German
Portuguese
Spanish; Castilian
Europarl QTLeap WSD/NED corpus

Europarl QTLeap WSD/NED corpus This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). The texts are sentences from the Europarl parallel corpus (Koehn, 2005). We selected the monolingual sentences from parallel corpora ...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
English
Portuguese
Spanish; Castilian
QTLeap WSD/NED corpus

QTLeap WSD/NED corpus This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). The texts are Q&A interactions from the real-user scenario (batches 1 and 2). The interactions in this corpus are available in Basque, Bulgar...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
English
Portuguese
Spanish; Castilian

Order by: