Brands.Br – a Portuguese Reviews Corpus

The Brands.Br corpus was built from a fraction of B2W-Reviews01 corpus. We use a set of 252 samples selected by B2W to be enriched. In Brands.Br corpus we want to solve two main challenges in product reviews corpus. The first: it is very common to find customer reviews referring to distinct thing...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
LT Corpus

The LT Corpus (Literary Corpus) contains approximately 1,781,083 running words of European and Brazilian Portuguese. It includes 70 copyright-free classics (61 Portugal and 9 from Brazil) published before 1940.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
ExtraGLUE-instruct

ExtraGLUE-instruct is a data set with examples from tasks, with instructions and with prompts that integrate instructions and examples, for both the European variant of Portuguese, spoken in Portugal, and the American variant of Portuguese, spoken in Brazil. For each variant, it contains over 170...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
LX-4WAnalogiesBR

The test set described in was used as the basis for the assessment of word embeddings. An example entry in this data set would read: ‘Berlin Germany Lisbon Portugal’. With these four words relations – as in this example – one can test semantic analogies by using any of the possible combinations o...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
English-Slovak corpus of annual reports from the Slovak National Centre for Human Rights website (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English-Slovak corpus of annual reports from the Slovak ...

Resource Type:Corpus
Media Type:Text
Languages:English
Slovak
Macroeconomic Developments (Processed)  

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bulletins of Macroeconomic Developments

Resource Type:Corpus
Media Type:Text
Languages:English
Greek, Modern (1453-)
LX-Battig

The LX-Battig was created from Battig test.set (Baroni et al., 2010). This data set has 83 concrete concepts of the following 10 categories: mammals, birds, fish, vegetables, fruit, trees, vehicles, clothes, tools and kitchenware. The categories names and the concepts were translated by two trans...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
COVID-19 EU presscorner v2 dataset. Bilingual (EN-PT)

Bilingual (EN-PT) corpus acquired from website (https://ec.europa.eu/commission/presscorner/) of the EU portal (8th July 2020).

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
COVID-19 ANTIBIOTIC dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from the website https://antibiotic.ecdc.europa.eu/ . It contains 20981 TUs (in total) for EN-X language pairs, where X is a CEF language.

Resource Type:Corpus
Media Type:Text
Languages:Bokmål, Norwegian; Norwegian Bokmål
Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Icelandic
Irish
Italian
Latvian
Lithuanian
Maltese
Moldavian; Moldovan
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish
CINTIL-Definitions

The corpus presented here is a collection of several tutorials and scientific papers in the field of Information Technology with 603 annotated definitions from Portuguese. The texts were collected from the Web at the beginning of the 2006 and they are organised in 32 files of three different sub-...

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)