The Coimisineir Teanga Bilingual Corpus of Reports and Press Releases (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Reports and Press Release data from the Language Commiss...

Resource Type:Corpus
Media Type:Text
Languages:English
Irish
Portuguese-English bilingual corpus from the Portuguese Constitution (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Complete text of the Portuguese Constitution in Portugue...

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
Macroeconomic Developments (Processed)  

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bulletins of Macroeconomic Developments

Resource Type:Corpus
Media Type:Text
Languages:English
Greek, Modern (1453-)
Parallel corpus from Parliament of Estonia (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel corpus compiled from contents of website of Par...

Resource Type:Corpus
Media Type:Text
Languages:English
Estonian
Parallel corpus from Bank of Estonia (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel corpus from content of Bank of Estonia website ...

Resource Type:Corpus
Media Type:Text
Languages:English
Estonian
Arabic Tweets NER test set

Despite many recent papers on Arabic Named Entity Recognition (NER) in the news domain, little work has been done on microblog NER. NER on microblogs presents many complications such as informality of language, shortened named entities, brevity of expressions, and inconsistent capitalization (for...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Arabic
An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis

An Arabic twitter data set of 7,503 tweets. The released data contains manual Sentiment Analysis annotations as well as automatically extracted features, saved in Comma Separated (CSV) and Attribute-Relation File Format (ARFF) file formats. Due to twitter privacy restrictions we replaced the orig...

Resource Type:Corpus
Media Type:Text
Language:Arabic
Multilingual corpora with coreferential annotation of person entities

Multilingual corpora with coreferential annotation of person entities ===================================================================== In-progress corpora with coreferent annotation of person entities. Sources: journals and Wikipedia. Languages: * Portuguese: varieties from Portugal, Brazi...

Resource Type:Corpus
Media Type:Text
Languages:Galician
Portuguese
Spanish; Castilian
International Agreements (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. International Agreements have been translated into natio...

Resource Type:Corpus
Media Type:Text
Languages:English
Latvian
Albertina PT-BR

Albertina PT-* is a foundation, large language model for the Portuguese language. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, and with most competitive performance for this language. It has different versions that were...

Resource Type:Language Description
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)