LexMan-POSTagger

LexMan-POSTagger is a morphological analyser tool that morphologically tags all words. Size: Lemmas verbs: 12 995; Lemmas nouns and adj: 38 180; Lemmas adverbs: 7 250; Compound words: 35 201. Language: Portuguese.

Resource Type:Tool / Service
Language:Portuguese
Biographies of Portuguese People

This is a set of 11.361 biographies of Portuguese people. The compilation of the data involved the biography collection from wikipedia and data conversion. Several filters were applied to remove entries that were mostly empty or non applicable content. Format: JSON (conversion from HTML) ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Europarl QTLeap WSD/NED corpus

Europarl QTLeap WSD/NED corpus This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). The texts are sentences from the Europarl parallel corpus (Koehn, 2005). We selected the monolingual sentences from parallel corpora ...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
English
Portuguese
Spanish; Castilian
A Tweet Dataset Annotated in Four Emotion Dimensions

A corpus of 2,019 tweets annotated along each of four emotion dimensions: Valence, Dominance, Arousal and Surprise. Two annotation schemes are used: a 5-point ordinal scale (using SAM manikins for Valence, Arousal and Dominance) and pair-wise comparisons with an "about the same" option (here 2,01...

Resource Type:Corpus
Media Type:Text
Language:English
EUIPO - IP case law French-English (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. EUIPO - IP case law (BoA) French-English

Resource Type:Corpus
Media Type:Text
Languages:English
French
Convention on the transfer of sentenced persons (English - Greek) (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Convention, additional protocol on the convention, recom...

Resource Type:Corpus
Media Type:Text
Languages:English
Greek, Modern (1453-)
Maltese-English website parallel corpus (Processed)  

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. This is a parallel corpus of bilingual texts crawled fro...

Resource Type:Corpus
Media Type:Text
Languages:English
Maltese
English-Slovak corpus of annual reports from the Slovak National Centre for Human Rights website (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English-Slovak corpus of annual reports from the Slovak ...

Resource Type:Corpus
Media Type:Text
Languages:English
Slovak
QTLeap Specialized lexicons

This resource comprises multilingual lexicon entries used for the translation of specific IT domain expressions. This gazetteer has been collected from four different sources: VLC, LibreOffice and KDE localization projects and IT domain Wikipedia articles.

Resource Type:Lexical / Conceptual
Media Type:Text
Languages:Basque
Czech
English
German
Portuguese
Spanish; Castilian
COVID-19 ANTIBIOTIC dataset. Multilingual (CEF languages)

Multilingual (CEF languages) corpus acquired from the website https://antibiotic.ecdc.europa.eu/ . It contains 20981 TUs (in total) for EN-X language pairs, where X is a CEF language.

Resource Type:Corpus
Media Type:Text
Languages:Bokmål, Norwegian; Norwegian Bokmål
Bulgarian
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Greek, Modern (1453-)
Hungarian
Icelandic
Irish
Italian
Latvian
Lithuanian
Maltese
Moldavian; Moldovan
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish; Castilian
Swedish

Order by:

Filter by:

Text (428)
Audio (17)
Image (1)