Search and Browse – PORTULAN CLARIN

XGLUE benchmark dataset

XGLUE is a new benchmark dataset to evaluate the performance of cross-lingual pre-trained models with respect to cross-lingual natural language understanding and generation. XGLUE is composed of 11 tasks spans 19 languages. For each task, the training data is only available in English. This me...

Resource Type:	Corpus
Media Type:	Text
Languages:	Arabic
	Bulgarian
	Chinese
	Dutch; Flemish
	English
	French
	German
	Greek, Modern (1453-)
	Hindi
	Italian
	Polish
	Portuguese
	Russian
	Spanish; Castilian
	Swahili
	Thai
	Turkish
	Urdu
	Vietnamese

Trilingual Documents related to International Judicial Cooperation in Civil Matters (Greek-English-French) (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Trilingual (Greek-English-French) documents - standard f...

Resource Type:	Corpus
Media Type:	Text
Languages:	English
	French
	Greek, Modern (1453-)

SIP Publications (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Publications from the Luxembourgish government edited by...

Resource Type:	Corpus
Media Type:	Text
Languages:	English
	French
	German

Parallel texts from Swedish Work environment Authority (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts from the Swedish Work Environment authori...

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
	Czech
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Italian
	Latvian
	Lithuanian
	Polish
	Romanian
	Spanish; Castilian
	Swedish

Parallel texts from Swedish Social Security Authority (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts, email templates and forms in pdf file fo...

Resource Type:	Corpus
Media Type:	Text
Languages:	Croatian
	English
	Finnish
	French
	German
	Italian
	Polish
	Romanian
	Spanish; Castilian
	Swedish

Parallel texts from Swedish National Food Agency (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts in pdf file format. Original in Swedish, ...

Resource Type:	Corpus
Media Type:	Text
Languages:	English
	Finnish
	French
	Polish
	Spanish; Castilian
	Swedish

Parallel texts from Swedish Labour market agency (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts, all in pdf files, have been gathered fro...

Resource Type:	Corpus
Media Type:	Text
Languages:	English
	Finnish
	French
	German
	Romanian
	Spanish; Castilian
	Swedish

Parallel texts from Swedish Labour market agency. Part 2 (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Same as part 1, but with the Readme-file. (Processed)

Resource Type:	Corpus
Media Type:	Text
Languages:	English
	Finnish
	French
	German
	Polish
	Romanian
	Spanish; Castilian
	Swedish

Parallel corpora finely aligned (subsentencial granularity)

Text corpus for bilingual concordancing, single- and multi-word translation extraction, machine translation. Languages: cs-pt, de-pt, en-pt, es-pt, fr-pt, it-pt, and pt-sk. Size: 1 G per language (phrases aligned). Domain: Law and Health.

Resource Type:	Corpus
Media Type:	Text
Languages:	Czech
	English
	French
	German
	Italian
	Portuguese
	Slovak
	Spanish; Castilian

Parallel corpora

Parallel corpora is a set of parallel texts in the domain of Law and Health, with 1 G per language. Languages: cs-pt, de-pt, en-pt, es-pt, fr-pt, it-pt, and pt-sk.

Resource Type:	Corpus
Media Type:	Text
Languages:	Arabic
	Chinese
	Czech
	English
	French
	German
	Portuguese
	Spanish; Castilian

Order by:

Filter by: