Hontology

Hontology (H stands for hotel, hostal and hostel) (available at http://ontolp.inf.pucrs.br/Recursos/downloads-Hontology.php) is a new multilingual ontology for the accommodation sector freely available, containing 282 concepts categorized into 16 top-level concepts. The concepts of other voca...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Portuguese, English, Spanish, French
Maltese Wiktionary

This lexicon is part of the collection of the Wikimedia Dumps which was retrieved as an XML file from http://dumps.wikimedia.org/mtwiktionary/20121105/ on November 5, 2012. In the Wikimedia dump, it is accompanied by a text file mtwiktionary-20121105-pages-articles-multistream-index.txt which li...

Resource Type:Lexical / Conceptual
Media Type:Text
Language:Maltese
Convention against Torture and Other Cruel, Inhuman or Degrading Treatment or Punishment - United Nations (French-English-Greek) (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English text of the Convention against Torture and Other...

Resource Type:Corpus
Media Type:Text
Languages:English
French
Greek, Modern (1453-)
Parallel texts from Swedish National Food Agency (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts in pdf file format. Original in Swedish, ...

Resource Type:Corpus
Media Type:Text
Languages:English
Finnish
French
Polish
Spanish; Castilian
Swedish
Parallel texts from Swedish Labour market agency (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts, all in pdf files, have been gathered fro...

Resource Type:Corpus
Media Type:Text
Languages:English
Finnish
French
German
Romanian
Spanish; Castilian
Swedish
Parallel texts from Swedish Labour market agency. Part 2 (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Same as part 1, but with the Readme-file. (Processed)

Resource Type:Corpus
Media Type:Text
Languages:English
Finnish
French
German
Polish
Romanian
Spanish; Castilian
Swedish
U-Compare Type system

The resource constitues of a hierarchically-structured system of data types, which is intended to be suitable for describing the inputs and output annotation types of a wide range of natural language processing applications which operate within the UIMA Framework. It is being developed in conjunc...

Resource Type:Language Description
Media Type:Text
Language:English
FEUP Tweets

Tweet corpus

Resource Type:Corpus
Media Type:Text
Language:English
Georeferenced Tweets

Tweets annotated with geographic coordinates

Resource Type:Corpus
Media Type:Text
Language:English
Parallel corpora finely aligned (subsentencial granularity)

Text corpus for bilingual concordancing, single- and multi-word translation extraction, machine translation. Languages: cs-pt, de-pt, en-pt, es-pt, fr-pt, it-pt, and pt-sk. Size: 1 G per language (phrases aligned). Domain: Law and Health.

Resource Type:Corpus
Media Type:Text
Languages:Czech
English
French
German
Italian
Portuguese
Slovak
Spanish; Castilian