LX-DSemVectors
Handle: | https://hdl.handle.net/21.11129/0000-000B-D38A-B (persistent URL to this page) |
---|
LX-DSemVectors is distributional lexical semantics model, also known as word embeddings, for Portuguese (Rodrigues et al., 2016). This version, 2.2b, was trained on a corpus of 2 billion tokens and achieved state-of-the-art results on multiple lexical semantic tasks (Rodrigues & Branco, 2018).
In this resource (~13GB), four models are made available, corresponding to the first four experiments reported in (Rodrigues & Branco, 2018).
LX-DSemVectors v2.2b achieves:
- 47.1% accuracy in the word analogy task, with the LX-4WAnalogies data set;
- 0.5146ρ, 0.3502ρ and 0.3618ρ in the lexical similarity task, respectively with the evaluation data sets LX-WordSim-353, LX-SimLex-999, and LX-Rare Word Similarity;
- and a purity of 0.5909, 0.8000 and 0.6438 in the conceptual categorization task, respectively with the LX-ESSLLI 2008, LX-Battig and the LX-AP evaluation data sets.
These benchmark data sets are also available in the PORTULAN repository.