Portuguese-English bilingual corpus from the Portuguese Constitution (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Complete text of the Portuguese Constitution in Portugue...

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
COVID-19 EU presscorner v1 dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (14th May 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 83217 TUs in total.

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Irish
Italian
Latvian
Lithuanian
Maltese
Moldavian; Moldovan
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish
Portuguese-English bilingual corpus from Legislation concerning the Portuguese Parliament (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Legislation concerning Portuguese Parliament; three bili...

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
Parallel corpora finely aligned (subsentencial granularity)

Text corpus for bilingual concordancing, single- and multi-word translation extraction, machine translation. Languages: cs-pt, de-pt, en-pt, es-pt, fr-pt, it-pt, and pt-sk. Size: 1 G per language (phrases aligned). Domain: Law and Health.

Resource Type:Corpus
Media Type:Text
Languages:Czech
English
French
German
Italian
Portuguese
Slovak
Spanish; Castilian
COVID-19 EC-EUROPA v1 dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/*coronavirus-response) of the EU portal (20th May 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 53311 TUs in total.

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Irish
Italian
Latvian
Lithuanian
Maltese
Moldavian; Moldovan
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish
COVID-19 EU presscorner v2 dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020). It contains 23 TMX files (EN-X, where X is a CEF language) with 151895 TUs in total.

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Irish
Italian
Latvian
Lithuanian
Maltese
Moldavian; Moldovan
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish
Multilingual corpora with coreferential annotation of person entities

Multilingual corpora with coreferential annotation of person entities ===================================================================== In-progress corpora with coreferent annotation of person entities. Sources: journals and Wikipedia. Languages: * Portuguese: varieties from Portugal, Brazi...

Resource Type:Corpus
Media Type:Text
Languages:Galician
Portuguese
Spanish; Castilian

Order by:

Filter by:

Portuguese (119)
English (31)
Czech (14)
Bulgarian (13)
German (12)
French (10)
Italian (9)
Polish (7)
Slovak (7)
Basque (6)
Danish (6)
Finnish (6)
Irish (6)
Latvian (6)
Maltese (6)
Swedish (6)
Chinese (3)
Arabic (2)
Hindi (1)
Russian (1)
Swahili (1)
Thai (1)
Turkish (1)
Urdu (1)