Search and Browse – PORTULAN CLARIN

COVID-19 EU presscorner v1 dataset. Bilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (14th May 2020).

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Portuguese

COVID-19 EUROPARL dataset v1. Bilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from the website (https://www.europarl.europa.eu/) of the European Parliament (25th April 2020)

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Portuguese

COVID-19 EUROPARL v2 dataset. Bilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from the website (https://www.europarl.europa.eu/) of the European Parliament (9th May 2020)

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Portuguese

COVID-19 Parallel Global Voices dataset. Bilingual (EN-PT)

EN-PT Bilingual COVID-19-related corpus acquired from the website (https://globalvoices.org/) of GlobalVoices (28th April 2020)

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Portuguese

COVID-19 EU presscorner v2 dataset. Bilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020).

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Portuguese

Termoteca

Terms from different sciences and industries - ecology, economy, law, sociology, medecine, tourism and computation.

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	English
	French
	Galician
	Portuguese
	Spanish; Castilian

ParaCrawl release 7 Portuguese-English

Portuguese-English parallel from release 7 of the ParaCrawl project, specifically "Broader Web-Scale Provision of Parallel Corpora for European Languages". This version is filtered with BiCleaner with a threshold of 0.5. Data was crawled from the web following robots.txt, as is standard practice....

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Portuguese

Parallel corpora finely aligned (subsentencial granularity)

Text corpus for bilingual concordancing, single- and multi-word translation extraction, machine translation. Languages: cs-pt, de-pt, en-pt, es-pt, fr-pt, it-pt, and pt-sk. Size: 1 G per language (phrases aligned). Domain: Law and Health.

Resource Type:	Corpus
Media Type:	Text
Languages:	Czech
	English
	French
	German
	Italian
	Portuguese
	Slovak
	Spanish; Castilian

XGLUE benchmark dataset

XGLUE is a new benchmark dataset to evaluate the performance of cross-lingual pre-trained models with respect to cross-lingual natural language understanding and generation. XGLUE is composed of 11 tasks spans 19 languages. For each task, the training data is only available in English. This me...

Resource Type:	Corpus
Media Type:	Text
Languages:	Arabic
	Bulgarian
	Chinese
	Dutch; Flemish
	English
	French
	German
	Greek, Modern (1453-)
	Hindi
	Italian
	Polish
	Portuguese
	Russian
	Spanish; Castilian
	Swahili
	Thai
	Turkish
	Urdu
	Vietnamese

COVID-19 EUR-LEX dataset. Βilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from website (https://eur-lex.europa.eu/legal-content) of the EU portal (9th July 2020)

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Portuguese

Order by:

Filter by: