Biographies of Portuguese People

This is a set of 11.361 biographies of Portuguese people. The compilation of the data involved the biography collection from wikipedia and data conversion. Several filters were applied to remove entries that were mostly empty or non applicable content. Format: JSON (conversion from HTML) ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Parallel corpus (Bulgarian - English) in the public administration domain (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel (bg-en) corpus of 11262 translation units in th...

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
English
Corpus on Finance and Economics from Bank of Latvia (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Contents of web site https://makroekonomika.lv/ -- Latvi...

Resource Type:Corpus
Media Type:Text
Languages:English
Latvian
Bilingual Bulgarian-English corpus from the 2018 Proposal for a National Climate Change Adaptation Strategy and Action Plan from the website of the Bulgarian Ministry of Environment and Water (Processed)  

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual Bulgarian-English corpus from the 2018 Proposa...

Resource Type:Corpus
Media Type:Text
Languages:Bulgarian
English
Parallel corpus from Bank of Estonia (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel corpus from content of Bank of Estonia website ...

Resource Type:Corpus
Media Type:Text
Languages:English
Estonian
Embeddings for Comparative Probing of Lexical Semantics Theories

Embeddings used in: Branco, António, João Rodrigues, Małgorzata Salawa, Ruben Branco and Chakaveh Saedi, 2020. Comparative Probing of Lexical Semantics Theories for Cognitive Plausibility and Technological Usefulness. In Proceedings of the International Conference on Computational Linguistics (C...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
Blacklist Classifier

A language identifier for closely related languages.

Resource Type:Tool / Service
Languages:Bosnian
Croatian
Czech
Portuguese
Serbian
Slovak
Dicionário de Gentílicos e Topónimos

Dicionário de Gentílicos e Topónimos is a list of pairs of toponyms and demonyms. The toponyms and demonyms included have a morphologically compositional relation between each other. The list contains around 1500 such pairs and additionally provides information on the toponym referent (upper unit...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese
International Agreements (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. International Agreements have been translated into natio...

Resource Type:Corpus
Media Type:Text
Languages:English
Latvian
ExtraGLUE

ExtraGLUE is a Portuguese dataset obtained by the automatic translation of some of the tasks in the GLUE and SuperGLUE benchmarks. Two variants of Portuguese are considered, namely European Portuguese and American Portuguese. The 14 tasks in extraGLUE cover different aspects of language unders...

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)