Search and Browse – PORTULAN CLARIN

LX-Sentence Splitter

LX-Sentence Splitter is a language processing tool for delimiting sentences in Portuguese. It was developed and is maintained by the NLX-Natural Language and Speech Group at the University of Lisbon, Department of Informatics. LX-Sentence Splitter marks sentence boundaries with <s>…</s>, and p...

Resource Type:	Tool / Service

RudriCo-POS

RudriCo-POS is a part-of-speech disambiguation tool that performs 188 morphological disambiguation rules.

Resource Type:	Tool / Service
Language:	Portuguese

MARv-DISAMB

MARv-DISAMB is a part-of-speech disambiguation tool (probabilistic disambiguation module).

Resource Type:	Tool / Service
Language:	Portuguese

ComLinToo: The Computational Linguistics Toolset

The Computational Linguistics Toolset is a set of tools for computational linguistics. It contains re-usable code for cleaning, splitting, refining, and taking samples from corpora (ICE, Penn, and a native one), for tagging them using the TnT-tagger, for doing permutation statistics on N-grams (u...

Resource Type:	Tool / Service

Monolingual concordancer

Monolingual concordancer is a language independent concordancer tool. Note that the tool is also able to be used as a bilingual concordancer. Several corpora are also included in this resource.

Resource Type:	Tool / Service

U-Compare/UIMA speech annotation viewer

This is a UIMA component that provides a visualization of speech based output from UIMA workflows. It has been developed at the University of Manchester, using libraries of the Java Speech Toollkit (jstk). It has been designed specifically for use with the U-Compare text mining workbench (see sep...

Resource Type:	Tool / Service

LX-UTagger

LX-UTagger is a POS tagger for Portuguese that adopts the Universal Part-of-Speech tagset (UPOS), related to the Universal Dependency framework, with an initial performance of 99.06% under a ten-fold cross validation scheme. It is described in this article: António Branco, João Ricardo Silv...

Resource Type:	Tool / Service
Language:	Portuguese

Maltese Wiktionary

This lexicon is part of the collection of the Wikimedia Dumps which was retrieved as an XML file from http://dumps.wikimedia.org/mtwiktionary/20121105/ on November 5, 2012. In the Wikimedia dump, it is accompanied by a text file mtwiktionary-20121105-pages-articles-multistream-index.txt which li...

Resource Type:	Lexical / Conceptual
Media Type:	Text
Language:	Maltese

Termcat Digital Marketing

Terms for Digital Marketing

Resource Type:	Lexical / Conceptual
Media Type:	Text
Languages:	Catalan; Valencian
	English
	French
	Galician
	German
	Italian
	Portuguese
	Spanish; Castilian

Arabic Tweets NER test set

Despite many recent papers on Arabic Named Entity Recognition (NER) in the news domain, little work has been done on microblog NER. NER on microblogs presents many complications such as informality of language, shortened named entities, brevity of expressions, and inconsistent capitalization (for...

Resource Type:	Lexical / Conceptual
Media Type:	Text
Language:	Arabic

Order by:

Filter by: