Search and Browse – PORTULAN CLARIN

A corpus of opinion articles annotated with arguments, following a claim-premise model.

Resource Type:	Corpus
Media Type:	Text
Language:	Portuguese

Termcat Digital Marketing

Terms for Digital Marketing

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	Catalan; Valencian
	English
	French
	Galician
	German
	Italian
	Portuguese
	Spanish; Castilian

Arabic Tweets NER test set

Despite many recent papers on Arabic Named Entity Recognition (NER) in the news domain, little work has been done on microblog NER. NER on microblogs presents many complications such as informality of language, shortened named entities, brevity of expressions, and inconsistent capitalization (for...

Resource Type:	Lexical / Conceptual
Media Type:	Text
Language:	Arabic

An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis

An Arabic twitter data set of 7,503 tweets. The released data contains manual Sentiment Analysis annotations as well as automatically extracted features, saved in Comma Separated (CSV) and Attribute-Relation File Format (ARFF) file formats. Due to twitter privacy restrictions we replaced the orig...

Resource Type:	Corpus
Media Type:	Text
Language:	Arabic

NoSta-D: German NER Dataset Train/Dev

Freely available large dataset, manually annotated for German NER. Includes nested span annotations. Source text from German Wikipedia and news. This data set does not contain the test data, which is used for the GermEval 2014 NER task at KONVENS. Test data will be available from September 2014.

Resource Type:	Corpus
Media Type:	Text
Language:	German

Bilingual hr-en parallel corpus from Croatian National Bank website (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Contents of http://www.hnb.hr were crawled, aligned on d...

Resource Type:	Corpus
Media Type:	Text
Languages:	Croatian
Languages:	English

Corpus of State-related content from the Latvian Web (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Latvian Web, home pages of ministries and state public s...

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Latvian

Bilingual hr-en parallel corpus from Croatian Mine Action website (Processed)

Resource Type:	Corpus
Media Type:	Text
Languages:	Croatian
Languages:	English

Parallel corpus from Parliament of Estonia (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel corpus compiled from contents of website of Par...

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Estonian

Bilingual Bulgarian-English corpus from the National Revenue Agency (BG) (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual Bulgarian-English corpus of administrative doc...

Resource Type:	Corpus
Media Type:	Text
Languages:	Bulgarian
Languages:	English

Order by:

Filter by: