Search and Browse – PORTULAN CLARIN

Multilingual corpora with coreferential annotation of person entities

Multilingual corpora with coreferential annotation of person entities ===================================================================== In-progress corpora with coreferent annotation of person entities. Sources: journals and Wikipedia. Languages: * Portuguese: varieties from Portugal, Brazi...

Resource Type:	Corpus
Media Type:	Text
Languages:	Galician
	Portuguese
	Spanish; Castilian

QTLeap News Corpus

This corpus is a sample extracted from the corpus made available by the annual workshops/conferences on Statistical Machine Translation (WMT, see \url{http://www.statmt.org/}) from the News domain. To this end, 1104 English sentences and their corresponding human translations into Czech, German a...

Resource Type:	Corpus
Media Type:	Text
Languages:	Basque
	Bulgarian
	Czech
	Dutch; Flemish
	English
	German
	Portuguese
	Spanish; Castilian

Parallel texts from Swedish Work environment Authority (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts from the Swedish Work Environment authori...

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
	Czech
	English
	Estonian
	Finnish
	French
	German
	Greek, Modern (1453-)
	Hungarian
	Italian
	Latvian
	Lithuanian
	Polish
	Romanian
	Spanish; Castilian
	Swedish

DiLAF African languages-French dictionaries

Bilingual dictionaries encoded in XML - Hausa-French dict. for basic cycle, 2008 Soutéba: 7,823 entries; - Kanuri-French dict. for basic cycle, 2004 Soutéba: 5,994 entries; - Tamajaq-French dict. for basic cycle, 2007 Soutéba: 5,205 entries; - Songhai-zarma-French dict. for basic cycle, 2007 Sout...

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	Central Kanuri
	Hausa
	Songhai languages
	Tamashek

MotaMot French-Khmer Pivot Database

French-Khmer pivot lexical database

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	Central Khmer
Languages:	French

Convention against Torture and Other Cruel, Inhuman or Degrading Treatment or Punishment - United Nations (French-English-Greek) (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English text of the Convention against Torture and Other...

Resource Type:	Corpus
Media Type:	Text
Languages:	English
	French
	Greek, Modern (1453-)

Georeferenced Tweets

Tweets annotated with geographic coordinates

Resource Type:	Corpus
Media Type:	Text
Language:	English

FEUP Tweets

Tweet corpus

Resource Type:	Corpus
Media Type:	Text
Language:	English

Termoteca

Terms from different sciences and industries - ecology, economy, law, sociology, medecine, tourism and computation.

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	English
	French
	Galician
	Portuguese
	Spanish; Castilian

AuCoPro - Splitting

The AuCoPro-Splitting dataset contains compounds annotated with their compound boundaries and linking morphemes. The dataset consists of two files, one for Afrikaans and one for Dutch. The annotation was performed according to annotation guidelines as described in Verhoeven, van Zaanen, van Huyss...

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	Afrikaans
Languages:	Dutch; Flemish

Order by:

Filter by: