WordSim353 | R Documentation |
A database of human similarity ratings for 351 English noun pairs, collected by Finkelstein et al. (2002) and annotated with semantic relations (similarity vs. relatedness) by Agirre et al. (2009).
WordSim353
A data frame with 351 rows and the following 6 columns:
word1
first noun (character)
word2
second noun (character)
score
average similarity rating by human judges on scale from 0 to 10 (numeric)
relation
semantic relation between first and second word (factor, see Details below)
similarity
whether word pair belongs to the similarity subset (logical)
relatedness
whether word pair belongs to the relatedness subset (logical)
The nouns are given as disambiguated lemmas in the form <headword>_N
.
The data set is known as WordSim353
because it originally consisted of 353 noun pairs.
One duplicate entry (money–cash) as well as the trivial combination
tiger–tiger (which may have been included as a control item)
have been omitted in the present version, however.
The following semantic relations are distinguished in the relation
variable:
synonym
, antonym
, hypernym
, hyponym
, co-hyponym
,
holonym
, meronym
and other
(topically related or completely unrelated).
Note that the similarity and relatedness subsets are not disjoint, because they
share 103 unrelated noun pairs (semantic relation other
and score below 5.0).
Similarity ratings (Finkelstein et al. 2002): https://gabrilovich.com/resources/data/wordsim353/wordsim353.html
Semantic relations (Agirre et al. 2009): http://alfonseca.org/eng/research/wordsim353.html
Agirre, Eneko, Alfonseca, Enrique, Hall, Keith, Kravalova, Jana, Pasca, Marius, and Soroa, Aitor (2009). A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2009), pages 19–27, Boulder, Colorado.
Finkelstein, Lev, Gabrilovich, Evgeniy, Matias, Yossi, Rivlin, Ehud, Solan, Zach, Wolfman, Gadi, and Ruppin, Eytan (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20(1), 116–131.
head(WordSim353, 20) table(WordSim353$relation) # semantic relations # split into "similarity" and "relatedness" subsets xtabs(~ similarity + relatedness, data=WordSim353)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.