Text corpus for bilingual concordancing, single- and multi-word translation extraction, machine translation. Languages: cs-pt, de-pt, en-pt, es-pt, fr-pt, it-pt, and pt-sk. Size: 1 G per language (phrases aligned). Domain: Law and Health.
277780 sentence pairs (in 23 EN-X language pairs in total) extracted from the Publications Office of the EU on the medical domain. These are sourced from laws, studies, EC announcements, etc. labelled with concepts like epidemiology, epidemic, disease surveillance, health control, public hygiene,...
Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 151895 TUs in total.
Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (14th May 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 83217 TUs in total.
Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/*coronavirus-response) of the EU portal (20th May 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 53311 TUs in total.
Multilingual (CEF languages) corpus acquired from website (https://eur-lex.europa.eu/legal-content) of the EU portal (9th July 2020). It contains 23 TMX files (EN-X, X is a CEF language) with 475,931 translation units pairs in total.
A language identifier for closely related languages.
Multilingual (CEF languages) corpus acquired from the website https://antibiotic.ecdc.europa.eu/ . It contains 20981 TUs (in total) for EN-X language pairs, where X is a CEF language.
This resource comprises multilingual lexicon entries used for the translation of specific IT domain expressions. This gazetteer has been collected from four different sources: VLC, LibreOffice and KDE localization projects and IT domain Wikipedia articles.
QTLeap WSD/NED corpus This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). The texts are Q&A interactions from the real-user scenario (batches 1 and 2). The interactions in this corpus are available in Basque, Bulgar...