Uplug (see Tiedemann, 2003a) is a collection of tools and scripts for processing text-corpora, for automatic alignment and for term extraction from parallel corpora. Several tools have been integrated in Uplug. Pre-processing tools include a sentence splitter, a general tokenizer and wrappers a...
Syntactic parser for English. Outputs predicate-argument structures. Also outputs base forms for each token. The tool is provided as a UIMA component, which forms part of the in-built library of components provided with the U-Compare platform (see separate META-SHARE record) for building and...
This is a UIMA wrapper for the OpenNLP Tokenizer tool. It assigns part-of-speech tags to tokens in English text. The tagset used in from the Penn Treebank). The tool forms part of the in-built library of components provided with the U-Compare platform (Kano et al., 2009; Kano et al., 2011; see se...
This is a UIMA wrapper for the OpenNLP Sentence Detector tool. It splits English text into individual sentences. The tool forms part of the in-built library of components provided with the U-Compare platform (see separate META-SHARE record) for building and evaluating text mining workflows. ...
YamCha is a generic, customizable, and open source text chunker oriented toward a lot of NLP tasks, such as POS tagging, Named Entity Recognition, base NP chunking, and Text Chunking. We used it for NP chunking.
LX-UTagger is a POS tagger for Portuguese that adopts the Universal Part-of-Speech tagset (UPOS), related to the Universal Dependency framework, with an initial performance of 99.06% under a ten-fold cross validation scheme. It is described in this article: António Branco, João Ricardo Silv...
The U-Compare Workbench is a graphical user interface that operates on top of the U-Compare platform. The U-Compare platform allows users to build and evaluate NLP workflows. Workflows consist of one or more components, consisting of corpus readers and tools, such as tokenisers, POS taggers, name...
This tool assigns a part-of-speech tag and base form to each token in a text. It operates on text that has previously been tokenised and morphologically analysed. The POS tagger is a module of Apertium machine translation system. The provided tool can currently operate on a subset of the language...
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies and categorises syntactic chunks in plain text Tools in workflow: Freeling shallow parser web service (service provided by the PANACEA project) NOTE: The licence provided cove...
A NER-classifier based on memory-based learning, trained on the CINTIL dataset, a corpus that contains part of the Corpus de Referência do Português Contemporâneo - CRPC (Reference Corpus of Contemporary Portuguese). https://portulanclarin.net/repository/browse/cintil-corpus-internacional-do-por...