The present tool, that was built to deal with Portuguese-specific issues concerning syntactic categorization, assigns a single morpho-syntactic tag, from the tagset below, to every token. The tag is attached to the token, using a / (slash) symbol as separator: um exemplo → um/IA exemplo/CN ...
The present tool, that was built to deal with specific issues concerning orthographic conventions adopted for Portuguese, marks sentence boundaries with <s>…</s>, and paragraph boundaries with <p>…</p>. Unwraps sentences split over different lines. A f-score of 99.94% was obtained when testing o...
Based on the MXPOST part of speech tagger and UNITEX dictionaries for Portuguese, this tool produces the lemmas of the words of a text stored in a plain text file. The source code is also provided.
ixa-pipe-pos-eu is a robust and wide-coverage morphological analyser and a Part-of-Speech tagger for Basque. The analyser is based on the two-level formalism and has been designed in an incremental way with three main modules: the standard analyser, the analyser of linguistic variants, and the...
ixa-pipe-ned-ukb is a multilingual Named Entity Disambiguation tool. It is based on UKB (http://ixa2.si.ehu.es/ukb/), a graph-based Word Sense Disambiguation tool. The Wikipedia graph built from the hyperlinks between Wikipedia articles is used for the processing. The input of the tool is ...
ixa-pipe-dep-eu is a Basque dependency parsing tool. It is based on MATE-tools. This tool takes a document in Natural Language Processing Annotation Format (NAF) format (http://wordpress.let.vupr.nl/naf/) and outputs a new NAF document. This tool is partly funded by the European Commission ...
ixa-pipe-coref-eu is a Basque coreference resolution tool, which is an adaptation of Stanford Deterministic Coreference Resolution (http://www-nlp.stanford.edu/downloads/dcoref.shtml). This tool reads a text document annotated with lemmas, named entities and constituents formated in Natural La...
The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts.
FORMA is a probabilistic tool for morphological tagging and lemmatization of text. The purpose of this tool is to obtain annotated text to be processed by other NLP tools (see Gonzalez et al., 2006).
Enju is a syntactic parser for English. The grammar used by the parser is based on Head Driven Phrase Structure Grammar (HPSG). Enju can analyse syntactic/semantic structures of English sentences can output phrase structure and predicate-argument structures.