LX-USuite

LX-USuite is a tool for shallow processing of Portuguese that adopts the Universal Part-of-Speech (UPOS) tagset and Universal feature bundles, related to the Universal Dependency framework, with an initial performance of 99.06% for POS tagging, 98.75% for featurizer model, and 99.08% for the lemmatizer model, under a ten-fold cross validation scheme.

It is (partly) described in this article:

António Branco, João Ricardo Silva, Luís Gomes and João Rodrigues, 2022, "Universal Grammatical Dependencies for Portuguese with CINTIL Data, LX Processing and CLARIN support", In Proceedings, 13th Conference on Language Resources and Evaluation (LREC2022).

which should be used as its canonical citation, and which interested users are referred for detailed information.

The POS tagger, lemmatizer and featurizer models are trained with its companion CINTIL-USuite corpus, with around 1 Million manually annotated tokens, which can be obtained here: https://hdl.handle.net/21.11129/0000-000F-327D-D.

This corpus supports its companion LX-UTagger, which can be used here: https://portulanclarin.net/workbench/lx-utagger/ and can be obtained here: https://hdl.handle.net/21.11129/0000-000E-8B2F-2.

You may also be interested in the following related resources that can also be found in this repository:
LX-UTagger (https://hdl.handle.net/21.11129/0000-000E-8B2F-2),
LX-UDParser (https://hdl.handle.net/21.11129/0000-000E-8B31-E),
LX-Suite (https://hdl.handle.net/21.11129/0000-000E-5991-A),
LX-Tagger (https://hdl.handle.net/21.11129/0000-000B-D325-D),
LX-DepParser (https://hdl.handle.net/21.11129/0000-000E-598D-0),
LX-Parser (https://hdl.handle.net/21.11129/0000-000E-5999-2).

Download


People who looked at this resource also viewed the following:
People who downloaded this resource also downloaded the following: