Search and Browse – PORTULAN CLARIN

Parallel texts from Swedish Labour market agency (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts, all in pdf files, have been gathered fro...

Resource Type:	Corpus
Media Type:	Text
Languages:	English
	Finnish
	French
	German
	Romanian
	Spanish; Castilian
	Swedish

Parallel texts from Swedish Labour market agency. Part 2 (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Same as part 1, but with the Readme-file. (Processed)

Resource Type:	Corpus
Media Type:	Text
Languages:	English
	Finnish
	French
	German
	Polish
	Romanian
	Spanish; Castilian
	Swedish

U-Compare Type system

The resource constitues of a hierarchically-structured system of data types, which is intended to be suitable for describing the inputs and output annotation types of a wide range of natural language processing applications which operate within the UIMA Framework. It is being developed in conjunc...

Resource Type:	Language Description
Media Type:	Text
Language:	English

Georeferenced Tweets

Tweets annotated with geographic coordinates

Resource Type:	Corpus
Media Type:	Text
Language:	English

FEUP Tweets

Tweet corpus

Resource Type:	Corpus
Media Type:	Text
Language:	English

Parallel corpora finely aligned (subsentencial granularity)

Text corpus for bilingual concordancing, single- and multi-word translation extraction, machine translation. Languages: cs-pt, de-pt, en-pt, es-pt, fr-pt, it-pt, and pt-sk. Size: 1 G per language (phrases aligned). Domain: Law and Health.

Resource Type:	Corpus
Media Type:	Text
Languages:	Czech
	English
	French
	German
	Italian
	Portuguese
	Slovak
	Spanish; Castilian

Khresmoi Query Translation Test Data for the Medical Domain version 1.0

This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from the general public and from medical experts.

Resource Type:	Corpus
Media Type:	Text
Languages:	Czech
	English
	French
	German

Parallel texts from Swedish Social Security Authority (Processed)

Resource Type:	Corpus
Media Type:	Text
Languages:	Croatian
	English
	Finnish
	French
	German
	Italian
	Polish
	Romanian
	Spanish; Castilian
	Swedish

MotaMot French-Khmer Pivot Database

French-Khmer pivot lexical database

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	Central Khmer
Languages:	French

DiLAF African languages-French dictionaries

Bilingual dictionaries encoded in XML - Hausa-French dict. for basic cycle, 2008 Soutéba: 7,823 entries; - Kanuri-French dict. for basic cycle, 2004 Soutéba: 5,994 entries; - Tamajaq-French dict. for basic cycle, 2007 Soutéba: 5,205 entries; - Songhai-zarma-French dict. for basic cycle, 2007 Sout...

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	Central Kanuri
	Hausa
	Songhai languages
	Tamashek

Order by:

Filter by: