COVID-19 EU presscorner v2 dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 151895 TUs in total.

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Irish
Italian
Latvian
Lithuanian
Maltese
Moldavian; Moldovan
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish
MLSS Tagger Web Service

The part of speech tagger for Maltese is based on TnT, the statistical part of speech tagger by Thorsten Brants (http://www.coli.uni-saarland.de/~thorsten/tnt/). It was modified for the Maltese Language Resource Server (MLRS) by Albert Gatt (Linguistics Department, University of Malta). The mode...

Resource Type:Tool / Service
Language:Maltese
MLSS Tokeniser Web Service

The web service is a tool which takes text as input and returns a list of tokens. The tokens can be orthographical words, numerals and punctuation marks. The tokeniser was designed to work on Maltese texts. The download for this resource only contains the narrative description in a Word file. ...

Resource Type:Tool / Service
Language:Maltese
Maltese-English website parallel corpus (Processed)  

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. This is a parallel corpus of bilingual texts crawled fro...

Resource Type:Corpus
Media Type:Text
Languages:English
Maltese

Order by: