LX-DSemVectors

Handle:	https://hdl.handle.net/21.11129/0000-000B-D38A-B (persistent URL to this page)

LX-DSemVectors is distributional lexical semantics model, also known as word embeddings, for Portuguese (Rodrigues et al., 2016). This version, 2.2b, was trained on a corpus of 2 billion tokens and achieved state-of-the-art results on multiple lexical semantic tasks (Rodrigues & Branco, 2018).

In this resource (~13GB), four models are made available, corresponding to the first four experiments reported in (Rodrigues & Branco, 2018).

LX-DSemVectors v2.2b achieves:
- 47.1% accuracy in the word analogy task, with the LX-4WAnalogies data set;
- 0.5146ρ, 0.3502ρ and 0.3618ρ in the lexical similarity task, respectively with the evaluation data sets LX-WordSim-353, LX-SimLex-999, and LX-Rare Word Similarity;
- and a purity of 0.5909, 0.8000 and 0.6438 in the conceptual categorization task, respectively with the LX-ESSLLI 2008, LX-Battig and the LX-AP evaluation data sets.

These benchmark data sets are also available in the PORTULAN repository.

Download

DistributionLicence

CC - BY

Contact Person

João Rodrigues Male

University of Lisbon, Faculty of Sciences

FCUL

[javascript protected email address]

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

Lexical Conceptual Resource
text

Lexical Conceptual Resource General Information

Computational Lexicon

Monolingual text lexicalConceptualResourceLanguages

Portuguese

Linguality

Linguality type: Monolingual

Size

17,572 Words

Metadata

Created: 10/19/2016

Last Updated: 11/22/2021

Metadata Creator

João Ricardo Silva

http://nlx-server.di.fc.ul.pt/~jsilva/

University of Lisbon, Faculty of Sciences

FCUL

[javascript protected email address]

Sala 6.3.32, Edifício C6, Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa

1749-016 Lisboa

Campo Grande

Portugal

Department of Informatics

http://nlx.di.fc.ul.pt/

FCUL

Faculdade de Ciências de Lisboa, Departamento de Informática. Campo Grande, 1749-016 Lisboa, Portugal

1749-016 Lisbon

Portugal

[javascript protected email address]

Tel.: +351 217 500 087

Fax: +351 217 500 084

Version

Version: 2.2b

Documentation

Document Type: In Proceedings

Rodrigues, João and António Branco, Finely tuned, 2 billion token based word embeddings for Portuguese, , pp. 2403-2409 , 11th International Conference on Language Resources and Evaluation (LREC) , 2018

Document Type: In Book

Rodrigues, João, António Branco, Steven Neal, and João Silva, LX-DSemVectors: Distributional Semantics Models for the Portuguese Language, 9727 , pp. 259-270 , 2016

Publisher: Springer

Book Title: Lecture Notes in Artificial Intelligence

People who looked at this resource also viewed the following:

People who downloaded this resource also downloaded the following:

MWN.PT - WordNet of Portuguese