Parallel Global Voices (Greek - English) (Processed)
|Handle:||https://hdl.handle.net/21.11129/0000-000D-FADA-4 (persistent URL to this page)|
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu.
Parallel Global Voices EL-EN is a parallel corpus generated from the Global Voices multilingual group of websites (http://globalvoices.org/), where volunteers publish and translate news stories in more than 40 languages. The original content from the Global Voices websites is available by the authors and publishers under a Creative Commons Attribution license. The content was crawled in July-August 2015 by researchers at the NLP group of the Institute for Language and Speech Processing. Documents that are translations of each other were paired on the basis of their link information. After document pairing, segment alignments were automatically extracted. The results of the automatic alignment at document and segment level are distributed under a Creative Commons Attribution license.
- Czech to English Machine translation module
- Romanian-English corpus with studies, reports and statistical data in the field of culture from the National Institute for Cultural Research and Training website (Processed)
- EUIPO - list of goods and services Spanish and English (Processed)
- Bilingual resource with Bulgarian strategic documents in the field of innovations and digital growth (Bulgarian - English) (Processed)