Parallel Global Voices (Greek - English) (Processed)

Handle:	https://hdl.handle.net/21.11129/0000-000D-FADA-4 (persistent URL to this page)
ELRA ID:	ELRA-W0202

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu.
Parallel Global Voices EL-EN is a parallel corpus generated from the Global Voices multilingual group of websites (http://globalvoices.org/), where volunteers publish and translate news stories in more than 40 languages. The original content from the Global Voices websites is available by the authors and publishers under a Creative Commons Attribution license. The content was crawled in July-August 2015 by researchers at the NLP group of the Institute for Language and Speech Processing. Documents that are translations of each other were paired on the basis of their link information. After document pairing, segment alignments were automatically extracted. The results of the automatic alignment at document and segment level are distributed under a Creative Commons Attribution license.

Download

DistributionLicence

CC - BY

Restrictions: Attribution

Distribution Access/Medium: Downloadable

Contact Person

Valérie Mapelli Female

http://www.elda.org

[javascript protected email address]

55-57 rue Brillat-Savarin

75013 Paris

France

Tel.: +1 43 13 33 33

Fax: +1 43 14 33 30

text

Bilingual text corpusLanguages

Greek, Modern (1453-) English

Linguality

Linguality type: Bilingual

Multi-linguality type: Parallel

Size

61,967 Units

Character encoding

UTF - 8

Domains

EDUCATION & COMMUNICATIONS

Modalities

Written Language

AnnotationAlignment

Segmentation level: Other

Resource Creation

Funding Project

European Language Resource Coordination LOT3 (ELRC Data - Tools and Resources for CEF Automated Translation - LOT3 (SMART 2015/1091 - 30-CE-0816766/00-92))

URL: http://www.lr-coordination.eu/

Funding Type: Eu Funds

Project duration: 12/13/2016 - 02/12/2020

Metadata

Created: 07/17/2020

Last Updated: 11/09/2020

Metadata Creator

Andrea Teixeira Female

University of Lisbon

[javascript protected email address]

Campo Grande

1749-016 Lisboa