Search and Browse – PORTULAN CLARIN

XGLUE benchmark dataset

XGLUE is a new benchmark dataset to evaluate the performance of cross-lingual pre-trained models with respect to cross-lingual natural language understanding and generation. XGLUE is composed of 11 tasks spans 19 languages. For each task, the training data is only available in English. This me...

Resource Type:	Corpus
Media Type:	Text
Languages:	Arabic
	Bulgarian
	Chinese
	Dutch; Flemish
	English
	French
	German
	Greek, Modern (1453-)
	Hindi
	Italian
	Polish
	Portuguese
	Russian
	Spanish; Castilian
	Swahili
	Thai
	Turkish
	Urdu
	Vietnamese

Parallel corpora

Parallel corpora is a set of parallel texts in the domain of Law and Health, with 1 G per language. Languages: cs-pt, de-pt, en-pt, es-pt, fr-pt, it-pt, and pt-sk.

Resource Type:	Corpus
Media Type:	Text
Languages:	Arabic
	Chinese
	Czech
	English
	French
	German
	Portuguese
	Spanish; Castilian

QTLeap Corpus V1.2

The QTLeap corpus is composed by 4000 question and answer pairs in the domain of computer and IT troubleshooting for both hardware and software. This material was collected using a support service via chat, this implies that the corpus is composed by naturally occurring utterances produced by use...

Resource Type:	Corpus
Media Type:	Text
Languages:	Basque
	Bulgarian
	Czech
	Dutch; Flemish
	English
	German
	Portuguese
	Spanish; Castilian

QTLeap News Corpus

This corpus is a sample extracted from the corpus made available by the annual workshops/conferences on Statistical Machine Translation (WMT, see \url{http://www.statmt.org/}) from the News domain. To this end, 1104 English sentences and their corresponding human translations into Czech, German a...

Resource Type:	Corpus
Media Type:	Text
Languages:	Basque
	Bulgarian
	Czech
	Dutch; Flemish
	English
	German
	Portuguese
	Spanish; Castilian

QTLeap LRT-M31-WP4

Treebanks and semantic lexicons for Basque, Bulgarian, Dutch, German and Portuguese. Created within European project QTLeap.

Resource Type:	Corpus
Media Type:	Text
Languages:	Basque
	Bulgarian
	Dutch; Flemish
	German

Termcat Neoloteca

Terms that have (more or less) recently been accepted and normalised by Termcat, mixed fields

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	Basque
	Catalan; Valencian
	English
	French
	Galician
	German
	Italian
	Latin
	Portuguese
	Spanish; Castilian

Termcat Industry

Industry terms

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	Basque
	Catalan; Valencian
	English
	French
	German
	Italian
	Portuguese
	Spanish; Castilian

QTLeap Specialized lexicons

This resource comprises multilingual lexicon entries used for the translation of specific IT domain expressions. This gazetteer has been collected from four different sources: VLC, LibreOffice and KDE localization projects and IT domain Wikipedia articles.

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	Basque
	Czech
	English
	German
	Portuguese
	Spanish; Castilian

COVID-19 ANTIBIOTIC dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from the website https://antibiotic.ecdc.europa.eu/ . It contains 20981 TUs (in total) for EN-X language pairs, where X is a CEF language.

Resource Type:	Corpus
Media Type:	Text
Languages:	Bokmål, Norwegian; Norwegian Bokmål
	Bulgarian
	Croatian
	Czech
	Danish
	Dutch; Flemish
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Icelandic
	Irish
	Italian
	Latvian
	Lithuanian
	Maltese
	Moldavian; Moldovan
	Polish
	Portuguese
	Romanian
	Slovak
	Slovenian
	Spanish; Castilian
	Swedish

COVID-19 EU presscorner v1 dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (14th May 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 83217 TUs in total.

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
	Croatian
	Czech
	Danish
	Dutch; Flemish
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Irish
	Italian
	Latvian
	Lithuanian
	Maltese
	Moldavian; Moldovan
	Polish
	Portuguese
	Romanian
	Slovak
	Slovenian
	Spanish; Castilian
	Swedish

Order by:

Filter by: