LX-DSemVectors

LX-DSemVectors is distributional lexical semantics model, also known as word embeddings, for Portuguese (Rodrigues et al., 2016). This version, 2.2b, was trained on a corpus of 2 billion tokens and achieved state-of-the-art results on multiple lexical semantic tasks (Rodrigues & Branco, 2018).

In this resource (~13GB), four models are made available, corresponding to the first four experiments reported in (Rodrigues & Branco, 2018).

LX-DSemVectors v2.2b achieves:
- 47.1% accuracy in the word analogy task, with the LX-4WAnalogies data set;
- 0.5146ρ, 0.3502ρ and 0.3618ρ in the lexical similarity task, respectively with the evaluation data sets LX-WordSim-353, LX-SimLex-999, and LX-Rare Word Similarity;
- and a purity of 0.5909, 0.8000 and 0.6438 in the conceptual categorization task, respectively with the LX-ESSLLI 2008, LX-Battig and the LX-AP evaluation data sets.

These benchmark data sets are also available in the PORTULAN repository.

Download

  • Lexical Conceptual Resource
  • text



People who looked at this resource also viewed the following:
People who downloaded this resource also downloaded the following: