This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from the general public and from medical experts.
French-Khmer pivot lexical database
Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/*coronavirus-response) of the EU portal (20th May 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 53311 TUs in total.
XGLUE is a new benchmark dataset to evaluate the performance of cross-lingual pre-trained models with respect to cross-lingual natural language understanding and generation. XGLUE is composed of 11 tasks spans 19 languages. For each task, the training data is only available in English. This me...
Terms of Research Thesaurus
Terms for Fairs and Congresses
Terms of Exotic Wood
Industry terms
Terms from different sciences and industries - ecology, economy, law, sociology, medecine, tourism and computation.
Multilingual (CEF languages) corpus acquired from the website https://antibiotic.ecdc.europa.eu/ . It contains 20981 TUs (in total) for EN-X language pairs, where X is a CEF language.