The LX-ESSLLI 2008 data set was created from the ESSLLI 2008 Distributional Semantic Workshop shared-task set, made of 44 concrete nouns grouped in 6 semantic categories (4 animate and 2 inanimate). The grouping is done in an hierarchical way following the top 10 properties from the McRae (2005) norms: bird-animal-natural; ground animal-animal-natural; fruit tree-vegetable-natural; green-vegetable-natural; tool-artifact-artifact; vehicle-artifact-artifact.
We kept the organization into the same categories, resulting in a list with the same size as the original data set.

You may also be interested in the other resources for the evaluation of distributional semantic models of Portuguese that are also available from this repository: LX-SimLex-999, LX-Rare Word Similarity Dataset, LX-WordSim-353, LX-Battig, LX-AP, LX-4WAnalogies and LX-4WAnalogiesBR.


