Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies sentences and tokens in plain text. Tools in workflow: Freeling sentence splitter web service (service provided by the PANACEA project), LX-Tokenizer (web service provided by th...
RudriCo-POS is a part-of-speech disambiguation tool that performs 188 morphological disambiguation rules.
MARv-DISAMB is a part-of-speech disambiguation tool (probabilistic disambiguation module).
LX-UDParser is a UD parser for Portuguese, which adopts the Universal Dependency framework, with an initial performance of 90.87 for UAS and 88.01 for LAS under a ten-fold cross validation scheme. It is described in this article: António Branco, João Ricardo Silva, Luís Gomes and João Rodri...
The present tool, that was built to deal with specific issues concerning orthographic conventions adopted for Portuguese, marks sentence boundaries with <s>…</s>, and paragraph boundaries with <p>…</p>. Unwraps sentences split over different lines. A f-score of 99.94% was obtained when testing o...
Lince is a multi-platform stand-alone application that updates the textual contents of documents in a range of popular formats to the spelling prescribed by the 1990 Portuguese language reform. It works with both previously existing Portuguese language orthographic standards (1943, previously val...
DiZer 2.0 is a web interface for discourse parsing. It is based on DiZer (Pardo and Nunes, 2008), the first discourse parser for Brazilian Portuguese. The system aims at producing the discourse structure of a source text following the Rhetorical Structure Theory – RST (Mann and Thompson, 1987), o...
LX-UTagger is a POS tagger for Portuguese that adopts the Universal Part-of-Speech tagset (UPOS), related to the Universal Dependency framework, with an initial performance of 99.06% under a ten-fold cross validation scheme. It is described in this article: António Branco, João Ricardo Silv...
A NER-classifier based on memory-based learning, trained on the CINTIL dataset, a corpus that contains part of the Corpus de Referência do Português Contemporâneo - CRPC (Reference Corpus of Contemporary Portuguese). https://portulanclarin.net/repository/browse/cintil-corpus-internacional-do-por...
SENTER is a SENtence splitTER for Portuguese.