FORMA is a probabilistic tool for morphological tagging and lemmatization of text. The purpose of this tool is to obtain annotated text to be processed by other NLP tools (see Gonzalez et al., 2006).
Lince is a multi-platform stand-alone application that updates the textual contents of documents in a range of popular formats to the spelling prescribed by the 1990 Portuguese language reform. It works with both previously existing Portuguese language orthographic standards (1943, previously val...
LX-UDParser is a UD parser for Portuguese, which adopts the Universal Dependency framework, with an initial performance of 90.87 for UAS and 88.01 for LAS under a ten-fold cross validation scheme. It is described in this article: António Branco, João Ricardo Silva, Luís Gomes and João Rodri...
Part-of-speech tagger tuned to biomedical text, provided as a web service.
Syntactic parser for English. Outputs predicate-argument structures. Also outputs base forms for each token. The tool is provided as a UIMA component, which forms part of the in-built library of components provided with the U-Compare platform (see separate META-SHARE record) for building and...
Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language pa...
ixa-pipe-pos-eu is a robust and wide-coverage morphological analyser and a Part-of-Speech tagger for Basque. The analyser is based on the two-level formalism and has been designed in an incremental way with three main modules: the standard analyser, the analyser of linguistic variants, and the...
Uplug (see Tiedemann, 2003a) is a collection of tools and scripts for processing text-corpora, for automatic alignment and for term extraction from parallel corpora. Several tools have been integrated in Uplug. Pre-processing tools include a sentence splitter, a general tokenizer and wrappers a...
YamCha is a generic, customizable, and open source text chunker oriented toward a lot of NLP tasks, such as POS tagging, Named Entity Recognition, base NP chunking, and Text Chunking. We used it for NP chunking.
ixa-pipe-coref-eu is a Basque coreference resolution tool, which is an adaptation of Stanford Deterministic Coreference Resolution (http://www-nlp.stanford.edu/downloads/dcoref.shtml). This tool reads a text document annotated with lemmas, named entities and constituents formated in Natural La...