Search and Browse – PORTULAN CLARIN

Multilingual corpus from the Publications Office of the EU on the medical domain v.2

277780 sentence pairs (in 23 EN-X language pairs in total) extracted from the Publications Office of the EU on the medical domain. These are sourced from laws, studies, EC announcements, etc. labelled with concepts like epidemiology, epidemic, disease surveillance, health control, public hygiene,...

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
	Croatian
	Czech
	Danish
	Dutch; Flemish
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Irish
	Italian
	Latvian
	Lithuanian
	Maltese
	Polish
	Portuguese
	Romanian
	Slovak
	Slovenian
	Spanish; Castilian
	Swedish

COVID-19 EC-EUROPA v1 dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/*coronavirus-response) of the EU portal (20th May 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 53311 TUs in total.

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
	Croatian
	Czech
	Danish
	Dutch; Flemish
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Irish
	Italian
	Latvian
	Lithuanian
	Maltese
	Moldavian; Moldovan
	Polish
	Portuguese
	Romanian
	Slovak
	Slovenian
	Spanish; Castilian
	Swedish

COVID-19 EU presscorner v2 dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 151895 TUs in total.

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
	Croatian
	Czech
	Danish
	Dutch; Flemish
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Irish
	Italian
	Latvian
	Lithuanian
	Maltese
	Moldavian; Moldovan
	Polish
	Portuguese
	Romanian
	Slovak
	Slovenian
	Spanish; Castilian
	Swedish

COVID-19 EU presscorner v1 dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (14th May 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 83217 TUs in total.

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
	Croatian
	Czech
	Danish
	Dutch; Flemish
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Irish
	Italian
	Latvian
	Lithuanian
	Maltese
	Moldavian; Moldovan
	Polish
	Portuguese
	Romanian
	Slovak
	Slovenian
	Spanish; Castilian
	Swedish

COVID-19 EUR-LEX dataset . Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://eur-lex.europa.eu/legal-content) of the EU portal (9th July 2020). It contains 23 TMX files (EN-X, X is a CEF language) with 475,931 translation units pairs in total.

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
	Croatian
	Czech
	Danish
	Dutch; Flemish
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Irish
	Italian
	Latvian
	Lithuanian
	Maltese
	Moldavian; Moldovan
	Polish
	Portuguese
	Romanian
	Slovak
	Slovenian
	Spanish; Castilian
	Swedish

English-Norwegian parallel corpus from Forbruker Europa, 2017 release (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Forbruker Europa is the Norwegian office of the European...

Resource Type:	Corpus
Media Type:	Text
Languages:	Bokmål, Norwegian; Norwegian Bokmål
Languages:	English

COVID-19 ANTIBIOTIC dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from the website https://antibiotic.ecdc.europa.eu/ . It contains 20981 TUs (in total) for EN-X language pairs, where X is a CEF language.

Resource Type:	Corpus
Media Type:	Text
Languages:	Bokmål, Norwegian; Norwegian Bokmål
	Bulgarian
	Croatian
	Czech
	Danish
	Dutch; Flemish
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Icelandic
	Irish
	Italian
	Latvian
	Lithuanian
	Maltese
	Moldavian; Moldovan
	Polish
	Portuguese
	Romanian
	Slovak
	Slovenian
	Spanish; Castilian
	Swedish

Basque-English ParDeepBank

This resource is part of Deliverable 4.6 of the QTLeap FP7 project (Contract number 610516). In its current development (15% of the intended goal of the project), it is composed of 150 sentences (1,416 English tokens and 1,275 Basque tokens). The sentences are excerpts from journalistic text from...

Resource Type:	Corpus
Media Type:	Text
Languages:	Basque
Languages:	English

Elhuyar-QTLeap WSD/NED corpus

This corpus is created from documents from translation memorios of Elhuyar Fundation (obtained via Eleka, member of the Advisory Board of Potential Users).

Resource Type:	Corpus
Media Type:	Text
Languages:	Basque
Languages:	English

Basque to English Machine translation module

Technical Description: http://qtleap.eu/wp-content/uploads/2015/05/Pilot1_technical_description.pdf http://qtleap.eu/wp-content/uploads/2015/05/TechnicalDescriptionPilot2_D2.7.pdf http://qtleap.eu/wp-content/uploads/2016/11/TechnicalDescriptionPilot3_D2.10.pdf

Resource Type:	Tool / Service
Languages:	Basque
Languages:	English

Order by:

Filter by: