Search and Browse – PORTULAN CLARIN

N3-Collection

We publish three novel datasets called N3. N3 will be published using NIF ensuring a greater interoperability to overcome the need for corpus-specific parsers. The data can be downloaded from our project homepage.

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	German

Multilingual corpus from the Publications Office of the EU on the medical domain v.2

277780 sentence pairs (in 23 EN-X language pairs in total) extracted from the Publications Office of the EU on the medical domain. These are sourced from laws, studies, EC announcements, etc. labelled with concepts like epidemiology, epidemic, disease surveillance, health control, public hygiene,...

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
	Croatian
	Czech
	Danish
	Dutch; Flemish
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Irish
	Italian
	Latvian
	Lithuanian
	Maltese
	Polish
	Portuguese
	Romanian
	Slovak
	Slovenian
	Spanish; Castilian
	Swedish

Memorias de traducción Portal oficial de turismo de España www.spain.info

Memoria de traducción Portal oficial de turismo de España www.spain.info

Resource Type:	Corpus
Media Type:	Text
Languages:	English
	French
	German
	Italian
	Portuguese
	Spanish; Castilian

LuxId

Corpus of mixed language (French, German,Luxemburguish) sentences from {sc Chamber} (House of Parliament) debate reports manually annotated at segment level with 6 labels : Lux, Fre, Ger, Lux + Fre, Lux + Ger, Lux + Fre + Ger

Resource Type:	Corpus
Media Type:	Text
Languages:	French
	German
	Luxembourgish; Letzeburgesch

Luxembourg Museum Websites (de-en) (Processed)

Resource Type:	Corpus
Media Type:	Text
Languages:	English
	French
	German

Letter of rights for persons arrested on the basis of a European Arrest Warrant (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Letter of rights for persons arrested on the basis of a ...

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
	Dutch; Flemish
	English
	French
	German
	Greek, Modern (1453-)
	Italian
	Latvian
	Polish
	Romanian

Khresmoi Query Translation Test Data for the Medical Domain version 1.0

This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from the general public and from medical experts.

Resource Type:	Corpus
Media Type:	Text
Languages:	Czech
	English
	French
	German

EUIPO - list of goods and services German and English (Processed)

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	German

COVID-19 EUR-LEX dataset . Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://eur-lex.europa.eu/legal-content) of the EU portal (9th July 2020). It contains 23 TMX files (EN-X, X is a CEF language) with 475,931 translation units pairs in total.

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
	Croatian
	Czech
	Danish
	Dutch; Flemish
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Irish
	Italian
	Latvian
	Lithuanian
	Maltese
	Moldavian; Moldovan
	Polish
	Portuguese
	Romanian
	Slovak
	Slovenian
	Spanish; Castilian
	Swedish

COVID-19 EU presscorner v2 dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 151895 TUs in total.

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
	Croatian
	Czech
	Danish
	Dutch; Flemish
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Irish
	Italian
	Latvian
	Lithuanian
	Maltese
	Moldavian; Moldovan
	Polish
	Portuguese
	Romanian
	Slovak
	Slovenian
	Spanish; Castilian
	Swedish

Order by:

Filter by: