LX-Tagger

Handle:	https://hdl.handle.net/21.11129/0000-000B-D325-D (persistent URL to this page)
URL:	http://lxcenter.di.fc.ul.pt/tools/en/LXTaggerEN.html

The present tool, that was built to deal with Portuguese-specific issues concerning syntactic categorization, assigns a single morpho-syntactic tag, from the tagset below, to every token. The tag is attached to the token, using a / (slash) symbol as separator:

um exemplo → um/IA exemplo/CN

Each individual token in multi-token expressions gets the tag of that expression prefixed by "L" and followed by the number of its position within the expression:

de maneira a que → de/LCJ1 maneira/LCJ2 a/LCJ3 que/LCJ4

This tagger was developed with TnT software over 90% of a small, 260 Ktoken, accurately hand tagged corpus. Accuracy of 96.87% was obtained with the tagger being trained over 90% of the 260 Ktokens and evaluated over the held out 10%, this being repeated over 10 different test runs and the results averaged.
LX-Tokenizer was developed and is maintained at University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics.

Contact Resource Maintainer

DistributionLicence

Proprietary

Restrictions: Academic - Non Commercial Use

User Nature: Academic

Download location: hidden

Distribution Access/Medium: Downloadable

Licensors:

António Branco

http://www.di.fc.ul.pt/~ahb/

University of Lisbon, Faculty of Sciences

FCUL

Associate Professor with Habilitation

[javascript protected email address]

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Tel.: +351 217 500 087

Fax: +351 217 500 084

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

Distribution rights holders:

António Branco

http://www.di.fc.ul.pt/~ahb/

University of Lisbon, Faculty of Sciences

FCUL

Associate Professor with Habilitation

[javascript protected email address]

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Tel.: +351 217 500 087

Fax: +351 217 500 084

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

IPR Holder

António Branco

http://www.di.fc.ul.pt/~ahb/

University of Lisbon, Faculty of Sciences

FCUL

Associate Professor with Habilitation

[javascript protected email address]

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Tel.: +351 217 500 087

Fax: +351 217 500 084

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

Contact Person

António Branco

http://www.di.fc.ul.pt/~ahb/

University of Lisbon, Faculty of Sciences

FCUL

Associate Professor with Habilitation

[javascript protected email address]

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Tel.: +351 217 500 087

Fax: +351 217 500 084

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

Tool/Service

Tool

Language Dependent

Input

Media type: Text

Resource type: Corpus

Modality: Written Language

Output

Media type: Text

Resource type: Corpus

Modality: Written Language

Segmentation level: Word

Operation

Operating system: Linux

Evaluation

Evaluated: True

Resource Creation

Resource Creator

António Branco

http://www.di.fc.ul.pt/~ahb/

University of Lisbon, Faculty of Sciences

FCUL

Associate Professor with Habilitation

[javascript protected email address]

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Tel.: +351 217 500 087

Fax: +351 217 500 084

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

Metadata

Created: 11/07/2012

Last Updated: 11/07/2012

Source: METANET4U

META-SHARE

Metadata Language: English (en)

Metadata Creator

Catarina Carvalheiro

http://nlx-server.di.fc.ul.pt/~catarina/

University of Lisbon, Faculty of Sciences

FCUL

Researcher

[javascript protected email address]

Departamento de Informática NLX - Grupo de Fala e Linguagem Natural, Faculdade de Ciências da Universidade de Lisboa, Edifício C6

1749-016 Lisbon

Tel.: +351 217 500 087

Fax: +351 217 500 084

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

Version

Version: 1.0

Last Updated: 11/07/2012

Documentation

Tool Documentation: Online

Document Type: Other

Catarina Carvalheiro, LX-Tagger Narrative Description, http://portulanclarin.net/repository/extradocs/LXTagger.pdf

Document Type: Masters Thesis

João Silva, Shallow Processing of Portuguese: From Sentence Chunking to Nominal Lemmatization, http://docs.di.fc.ul.pt/jspui/handle/10455/3095 , 2007

Document Language: English

People who looked at this resource also viewed the following:

Resources from the same creators