LexMan-ChunkerTokenizer

LexMan-ChunkerTokenizer is a tokenizer and sentence splitter tool. Marks sentence boundaries, multi-word boundaries. Size: Lemmas verbs: 12 995; Lemmas nouns and adj: 38 180; Lemmas adverbs: 7 250; Compound words: 35 201. Language: Portuguese.

Resource Type:Tool / Service
Language:Portuguese
Corpus of Semantic Graphs with associated English strings

Automatically generated corpus of 98,818 graph/string pairs.

Resource Type:Corpus
Media Type:Text
Language:American English
MARv-DISAMB

MARv-DISAMB is a part-of-speech disambiguation tool (probabilistic disambiguation module).

Resource Type:Tool / Service
Language:Portuguese
Georeferenced Tweets

Tweets annotated with geographic coordinates

Resource Type:Corpus
Media Type:Text
Language:English
LX-Tagger

LX-Tagger is a freely available online service for the part-of-speech tagging of Portuguese. It was developed and is mantained by the NLX-Natural Language and Speech Group at the University of Lisbon, Department of Informatics. The service is composed by a set of shallow processing tools: A se...

Resource Type:Tool / Service
Language:Portuguese
BMI Brochures 2011-2015 (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. English translations of German BMI brochures from the la...

Resource Type:Corpus
Media Type:Text
Languages:English
German
BMVI Website (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. tmx file, 2718 TUs, bilingual German/English, texts from...

Resource Type:Corpus
Media Type:Text
Languages:English
German
Illum Corpus

The full editions of ILLUM from 12/11/2006 to 30/05/2010 (185 issues).

Resource Type:Corpus
Media Type:Text
Language:Maltese
RudriCo-POS

RudriCo-POS is a part-of-speech disambiguation tool that performs 188 morphological disambiguation rules.

Resource Type:Tool / Service
Language:Portuguese
UEvora Tagger

UEvora Tagger is a freely available on-line service for tagging sentences written in Portuguese. This service was developed and is maintained at the University of Évora by the VISTA - Video, Image, Speech, and Text Analysis Group of the Department of Informatics.

Resource Type:Tool / Service

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)