Europarl QTLeap WSD/NED corpus

Europarl QTLeap WSD/NED corpus This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). The texts are sentences from the Europarl parallel corpus (Koehn, 2005). We selected the monolingual sentences from parallel corpora ...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
English
Portuguese
Spanish; Castilian
Europarl-QTLeap WSD/NED corpus

The texts are sentences from the Europarl parallel corpus (Koehn, 2005). The textscontain the monolingual sentences from parallel corpora for the following pairs: Bulgarian-English, Czech-English, Portuguese-English and Spanish- English. The English corpus is comprised by the English side of th...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
English
Portuguese
Spanish; Castilian
Hontology

Hontology (H stands for hotel, hostal and hostel) (available at http://ontolp.inf.pucrs.br/Recursos/downloads-Hontology.php) is a new multilingual ontology for the accommodation sector freely available, containing 282 concepts categorized into 16 top-level concepts. The concepts of other voca...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese, English, Spanish, French
Maltese Wiktionary

This lexicon is part of the collection of the Wikimedia Dumps which was retrieved as an XML file from http://dumps.wikimedia.org/mtwiktionary/20121105/ on November 5, 2012. In the Wikimedia dump, it is accompanied by a text file mtwiktionary-20121105-pages-articles-multistream-index.txt which li...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Maltese
News-QTLeap WSD/NED corpus

The texts are sentences from the News parallel corpus. The texts contain monolingual sentences from parallel corpora for the following pairs: Basque-English, Bulgarian-English, Czech-English, Portuguese-English and Spanish-English. The English corpus is comprised by the English side of the Spanis...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
English
Portuguese
Spanish; Castilian
Parallel corpora

Parallel corpora is a set of parallel texts in the domain of Law and Health, with 1 G per language. Languages: cs-pt, de-pt, en-pt, es-pt, fr-pt, it-pt, and pt-sk.

Resource Type:Corpus
Media Type:Text
Languages:Arabic
Chinese
Czech
English
French
German
Portuguese
Spanish; Castilian
Parallel corpora finely aligned (subsentencial granularity)

Text corpus for bilingual concordancing, single- and multi-word translation extraction, machine translation. Languages: cs-pt, de-pt, en-pt, es-pt, fr-pt, it-pt, and pt-sk. Size: 1 G per language (phrases aligned). Domain: Law and Health.

Resource Type:Corpus
Media Type:Text
Languages:Czech
English
French
German
Italian
Portuguese
Slovak
Spanish; Castilian
QTLeap Corpus V1.2

The QTLeap corpus is composed by 4000 question and answer pairs in the domain of computer and IT troubleshooting for both hardware and software. This material was collected using a support service via chat, this implies that the corpus is composed by naturally occurring utterances produced by use...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
Dutch; Flemish
English
German
Portuguese
Spanish; Castilian
QTLeap LRT-M31-WP4

Treebanks and semantic lexicons for Basque, Bulgarian, Dutch, German and Portuguese. Created within European project QTLeap.

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Dutch; Flemish
German
QTLeap News Corpus

This corpus is a sample extracted from the corpus made available by the annual workshops/conferences on Statistical Machine Translation (WMT, see \url{http://www.statmt.org/}) from the News domain. To this end, 1104 English sentences and their corresponding human translations into Czech, German a...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
Dutch; Flemish
English
German
Portuguese
Spanish; Castilian