Search and Browse – PORTULAN CLARIN

Romanian - English news corpus (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual Romanian – English news corpus built from Sout...

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Romanian

Romanian Ombudsman archive (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel aligned corpus in tmx format built from the Rom...

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Romanian

RudriCo-POS

RudriCo-POS is a part-of-speech disambiguation tool that performs 188 morphological disambiguation rules.

Resource Type:	Tool / Service
Language:	Portuguese

RudriCo-TOK

RudriCo-TOK is a tokenizer tool that splits contractions. De-contraction rules: 178.

Resource Type:	Tool / Service
Language:	Portuguese

Secretariat-General parallel corpus SL-EN and EN-SL (part 1) (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English-Slovenian parallel corpus in TMX format from the...

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Slovenian

Secretariat-General parallel corpus SL-EN and EN-SL (part 2) (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English-Slovenian parallel corpus in TMX format from the...

Resource Type:	Corpus
Media Type:	Text
Languages:	English
Languages:	Slovenian

SemLink

SemLink is a project whose aim is to link together different lexical resources via a set of mappings. These mappings will make it possible to combine the different information provided by these different lexical resources for tasks such as inferencing. In the current release, two mappings are ava...

Resource Type:	Lexical / Conceptual
Media Type:	Text
Language:	English

SenseClusters

SenseClusters is a package of (mostly) Perl programs that allows a user to cluster similar contexts together using unsupervised knowledge-lean methods.

Resource Type:	Tool / Service

SENTER

SENTER is a SENtence splitTER for Portuguese.

Resource Type:	Tool / Service
Language:	Portuguese

SETimes.HR

We present SETimes.HR ― the first linguistically annotated corpus of Croatian that is freely available for all purposes. The corpus is built on top of the SETimes parallel corpus of nine Southeast European languages and English. It is manually annotated for lemmas, morphosyntactic tags, named ent...

Resource Type:	Corpus
Media Type:	Text
Language:	Czech

Order by:

Filter by: