Romanian - English literature corpus (Processed)

Handle:	https://hdl.handle.net/21.11129/0000-000D-FBBA-7 (persistent URL to this page)
ELRA ID:	ELRA-W0192

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu.
Bilingual Romanian – English literature corpus built from a small set of freely available literature books (drama, sci-fi, etc.). The texts are positionally aligned, i.e. the sentence on line i in the English text is aligned with the sentence on line i in the Romanian text. Alignment was manually validated.

Download

DistributionLicence

CC - BY

Restrictions: Attribution

Distribution Access/Medium: Downloadable

Contact Person

Valérie Mapelli Female

http://www.elda.org

[javascript protected email address]

55-57 rue Brillat-Savarin

75013 Paris

France

Tel.: +1 43 13 33 33

Fax: +1 43 14 33 30

text

Bilingual text corpusLanguages

Romanian (88,498 Words) English (87,681 Words)

Linguality

Linguality type: Bilingual

Multi-linguality type: Parallel

Text Format

Application/x-tmx+xml (1)

Size

5,280 Units

176,179 Words

Character encoding

UTF - 8

Domains

SOCIAL QUESTIONS

Modalities

Written Language

Classification

Text type: literaryTexts

Text genre: fiction

AnnotationAlignment

Segmentation level: Sentence

Creation

Creation mode details: Conversion from Moses-like format to TMX

Creation mode: Automatic

Resource Creation

Funding Project

European Language Resource Coordination LOT3 (ELRC Data - Tools and Resources for CEF Automated Translation - LOT3 (SMART 2015/1091 - 30-CE-0816766/00-92))

URL: http://www.lr-coordination.eu/

Funding Type: Eu Funds

Project duration: 12/13/2016 - 02/12/2020

Metadata

Created: 07/18/2020

Last Updated: 11/09/2020

Metadata Creator

Vassilis Papavassiliou

Athena Research Center

ILSP / ATHENA R.C.

[javascript protected email address]

Artemidos 6 & Epidavros