The present tool, that was built to deal with Portuguese-specific issues concerning syntactic categorization, assigns a single morpho-syntactic tag, from the tagset below, to every token. The tag is attached to the token, using a / (slash) symbol as separator: um exemplo → um/IA exemplo/CN ...
Uplug (see Tiedemann, 2003a) is a collection of tools and scripts for processing text-corpora, for automatic alignment and for term extraction from parallel corpora. Several tools have been integrated in Uplug. Pre-processing tools include a sentence splitter, a general tokenizer and wrappers a...
FEUP CoRef is a freely available online service for coreference resolution in Portuguese and Spanish. This service was developed and is maintained at the Faculdade de Engenharia da Universidade do Porto Department of Informatics.
RudriCo-POS is a part-of-speech disambiguation tool that performs 188 morphological disambiguation rules.
Reddit Dataset Extraction Tool (RDET) is a tool that takes advantage of the resources available at 'pushshift.io' that relate to Reddit comments and submissions and generates new datasets based on any given subreddit.
ixa-pipe-coref-eu is a Basque coreference resolution tool, which is an adaptation of Stanford Deterministic Coreference Resolution (http://www-nlp.stanford.edu/downloads/dcoref.shtml). This tool reads a text document annotated with lemmas, named entities and constituents formated in Natural La...
Tokenisation is one of the functionalities of the GENIA tagger, which additionally outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts. The tool is a UIMA component, which forms part of th...
LX Semantic Similarity is an online service for measuring the semantic similarity between words in Portuguese. This service uses the LX-DSemVectors, a distributional semantics model (a.k.a. word embeddings) of the Portuguese language. The model represents each word in its vocabulary by a vecto...
LX-Sentence Splitter is a language processing tool for delimiting sentences in Portuguese. It was developed and is maintained by the NLX-Natural Language and Speech Group at the University of Lisbon, Department of Informatics. LX-Sentence Splitter marks sentence boundaries with <s>…</s>, and p...
LX-Proficiency is an online service for the quantitative analysis of texts along a range of linguistic metrics, and for the estimation of the proficiency level of texts. These quantitative metrics are meant to provide support in the classification of texts according to the proficiency levels i...