Terms that have (more or less) recently been accepted and normalised by Termcat, mixed fields
Parallel corpora is a set of parallel texts in the domain of Law and Health, with 1 G per language. Languages: cs-pt, de-pt, en-pt, es-pt, fr-pt, it-pt, and pt-sk.
Multilingual (CEF languages) corpus acquired from the website https://antibiotic.ecdc.europa.eu/ . It contains 20981 TUs (in total) for EN-X language pairs, where X is a CEF language.
Multilingual (CEF languages) corpus acquired from website (https://eur-lex.europa.eu/legal-content) of the EU portal (9th July 2020). It contains 23 TMX files (EN-X, X is a CEF language) with 475,931 translation units pairs in total.
Industry terms
Memoria de traducción Portal oficial de turismo de España www.spain.info
XGLUE is a new benchmark dataset to evaluate the performance of cross-lingual pre-trained models with respect to cross-lingual natural language understanding and generation. XGLUE is composed of 11 tasks spans 19 languages. For each task, the training data is only available in English. This me...
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts in pdf file format. Original in Swedish, ...
Transcriptions in the CIEMPIESS-PNPD are based on a phonetic alphabet called Mexbet. Mexbet was design for the Spanish of Central Mexico and it has several levels of granularity. The CIEMPIESS-PNPD comes in two versions: Mexbet T29 and Mexbet T66. Level T29 of Mexbet means that transcriptions ...
Wixarika is an indigenous language spoken in central west Mexico by approximately fifty thousand people. For indigenous languages like Wixarika, there is a lack of digital resources in general since native speakers do not necessarily generate a digital fingerprint on public forums. The lack of...