This is a workflow that is designed especially for use in the UIMA-based U-Compare workbench (see separate META-SHARE record). The workflow is in "ucz" format (specific to U-Compare) and can be imported via the "Import Workflow" item in the "Workflows" menu of the U-Compare interface. It include...
FORMA is a probabilistic tool for morphological tagging and lemmatization of text. The purpose of this tool is to obtain annotated text to be processed by other NLP tools (see Gonzalez et al., 2006).
Part-of-speech tagger tuned to biomedical text, provided as a web service.
Technical Description: http://qtleap.eu/wp-content/uploads/2015/05/Pilot1_technical_description.pdf http://qtleap.eu/wp-content/uploads/2015/05/TechnicalDescriptionPilot2_D2.7.pdf http://qtleap.eu/wp-content/uploads/2016/11/TechnicalDescriptionPilot3_D2.10.pdf
LexMan-POSTagger is a morphological analyser tool that morphologically tags all words. Size: Lemmas verbs: 12 995; Lemmas nouns and adj: 38 180; Lemmas adverbs: 7 250; Compound words: 35 201. Language: Portuguese.
LexMan-ChunkerTokenizer is a tokenizer and sentence splitter tool. Marks sentence boundaries, multi-word boundaries. Size: Lemmas verbs: 12 995; Lemmas nouns and adj: 38 180; Lemmas adverbs: 7 250; Compound words: 35 201. Language: Portuguese.
RudriCo-TOK is a tokenizer tool that splits contractions. De-contraction rules: 178.
Technical Description: http://qtleap.eu/wp-content/uploads/2015/05/Pilot1_technical_description.pdf http://qtleap.eu/wp-content/uploads/2015/05/TechnicalDescriptionPilot2_D2.7.pdf http://qtleap.eu/wp-content/uploads/2016/11/TechnicalDescriptionPilot3_D2.10.pdf
MARv-POS is a part-of-speech tagger tool (probabilistic POS annotation module). MARv4's architecture comprehends two submodules: a set of linguistically-oriented disambiguation rules module and a probabilistic disambiguation module. The linguistic-oriented is no longer used in the STRING chain be...
MARv-DISAMB is a part-of-speech disambiguation tool (probabilistic disambiguation module).