The LX-SimLex-999 was created from SimLex-999 (Hill et al., 2015) which, in turn, was based in the University of South Florida Free Association Database (USF) (Nelson et al., 2014).
There were strict guidelines to create SimLex-999. Both words in each pair have the same morphosyntactic category and the multiword expressions and named entities were excluded from that data set. Besides the morphosyntactic category criteria, the level of concreteness of each word was important. The word pairs in the USF data set had been tagged with a concreteness level that was provided by human annotators, on a scale of 1-7. In the creation of SimLex-999, this classification was taken into account and the pairs in which one of the concepts was more concrete than the other were not included.
The result was 999 word pairs organized in the following way: 666 pairs of noun-noun, 222 pairs of verb-verb and 111 pairs of adjective-adjective. Each pair received a score on a scale from 0 (totally unrelated) to 6 (very similar).

You may also be interested in the other resources for the evaluation of distributional semantic models of Portuguese that are also available from this repository: LX-Rare Word Similarity Dataset, LX-WordSim-353, LX-ESSLLI 2008, LX-Battig, LX-AP, LX-4WAnalogies and LX-4WAnalogiesBR.


People who looked at this resource also viewed the following:
People who downloaded this resource also downloaded the following: