LX Semantic Similarity is an online service for measuring the semantic similarity between words in Portuguese. This service uses the LX-DSemVectors, a distributional semantics model (a.k.a. word embeddings) of the Portuguese language.
The model represents each word in its vocabulary by a vector of real numbers. This vector representation allows the user to obtain a measure of similarity between two or more words, calculated by means of the cosine distance between the vectors of those words.
The online service provides two types of search:
- Word pair: By inserting two words, the service displays the distance between them and an interactive 2D plot (t-SNE embedding) with the 200 closest words to each of the input words.
- Single word: By inserting a single word, the service shows a word cloud with the 40 most similar words (using a larger font size for words closer to the input word) and a table with the 15 most similar terms (and their similarity with respect to the input word).
The LX-DSemVectors are described in the following publication:
- Rodrigues, João, António Branco, Steven Neale and João Silva, 2016, "LX-DSemVectors: Distributional Semantics Models for the Portuguese Language", Lecture Notes in Artificial Intelligence, 9727, Berlin, Springer, pp.259-270.
and are distributed via GitHub.