Uplug (see Tiedemann, 2003a) is a collection of tools and scripts for processing text-corpora, for automatic alignment and for term extraction from parallel corpora. Several tools have been integrated in Uplug. Pre-processing tools include a sentence splitter, a general tokenizer and wrappers a...
Lince is a multi-platform stand-alone application that updates the textual contents of documents in a range of popular formats to the spelling prescribed by the 1990 Portuguese language reform. It works with both previously existing Portuguese language orthographic standards (1943, previously val...
The purpose of the tool is to detect sentence boundaries in English text. It is trained on the GENIA corpus of biomedical abstracts and so is particularly suitable for splitting sentences in biomedical texts. The tool is provided as a UIMA component, which forms part of the in-built library of co...
SenseClusters is a package of (mostly) Perl programs that allows a user to cluster similar contexts together using unsupervised knowledge-lean methods.
The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts. The tool is provided as a UIMA component, which forms part of the in-built library of...
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Performs discourse parsing on plain text. Also identifies sentences, tokens, parts of speech, lemmas, clauses and coreference chains Tools in workflow: UAIC-POSTagger, UAIC-NPChunker, UAI...
The purpose of the tool is to identify gene and protein names in biomedical text. The tool is provided as a UIMA component, which forms part of the in-built library of components provided with the U-Compare platform for building and evaluating text mining workflows. The U-Compare Workbench pr...
Enju is a syntactic parser for English. The grammar used by the parser is based on Head Driven Phrase Structure Grammar (HPSG). Enju can analyse syntactic/semantic structures of English sentences can output phrase structure and predicate-argument structures.
MSTParser is a non-projective dependency parser (see McDonald et al., 2005a, 2006) that searches for maximum spanning trees over directed graphs. Models of dependency structure are based on large-margin discriminative training methods (see McDonald et al., 2005b). Projective parsing is also suppo...
LX-SRLabeler is a freely available on-line service for constituency parsing and semantic role labeling of Portuguese sentences. This service was developed and is maintained at University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics. LX-SRLabeler is su...