CorPop: a corpus of popular Brazilian Portuguese

This research proposes a corpus of popular Brazilian Portuguese, called CorPop, with texts selected based on the average level of literacy of the country's readers. CorPop’s theoretical and methodological bases are interdisciplinary and fall within the scope of Language Studies and related discip...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
The Coimisineir Teanga Bilingual Corpus of Reference Documents (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. General Reference content from the Language Commissioner...

Resource Type:Corpus
Media Type:Text
Languages:English
Irish
Monolingual documents from the Government of Lithuania (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Monolingual documents received from the Government of th...

Resource Type:Corpus
Media Type:Text
Language:Lithuanian
The Gaois bilingual corpus of English-Irish legislation (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual corpus of English-Irish legislation provided b...

Resource Type:Corpus
Media Type:Text
Languages:English
Irish
Secretariat-General parallel corpus SL-EN and EN-SL (part 2) (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English-Slovenian parallel corpus in TMX format from the...

Resource Type:Corpus
Media Type:Text
Languages:English
Slovenian
NomLex-BR

A computational lexicon for Portuguese that provides mappings between verbs and their nominalizations.

Resource Type:Lexical / Conceptual
Media Type:Text
Languages:Brazilian Portuguese
Portuguese
An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis

An Arabic twitter data set of 7,503 tweets. The released data contains manual Sentiment Analysis annotations as well as automatically extracted features, saved in Comma Separated (CSV) and Attribute-Relation File Format (ARFF) file formats. Due to twitter privacy restrictions we replaced the orig...

Resource Type:Corpus
Media Type:Text
Language:Arabic
Corpus of Semantic Graphs with associated English strings

Automatically generated corpus of 98,818 graph/string pairs.

Resource Type:Corpus
Media Type:Text
Language:American English
Burst-Annotated Co-Occurrence Network for the Arab Spring Domain

A burst-annotated co-occurrence network about the Arab Spring topic built on the top of New York Times article snapshots from the years 2010-2013.

Resource Type:Corpus
Media Type:Text
Language:American English
CINTIL-Definitions

The corpus presented here is a collection of several tutorials and scientific papers in the field of Information Technology with 603 annotated definitions from Portuguese. The texts were collected from the Web at the beginning of the 2006 and they are organised in 32 files of three different sub-...

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Text (445)
Audio (18)
Image (1)