SND: Semantic neighborhood density

View source: R/SND.r

SNDR Documentation

Semantic neighborhood density

Description

Returns semantic neighborhood with semantic neighborhood size and density

Usage

SND(x,n=NA,threshold=3.5,tvectors=tvectors)

Arguments

x

a character vector of length(x) = 1 or a numeric of length=ncol(tvectors) vector with same dimensionality as the semantic space

n

if specified as a numeric, determines the size of the neighborhood as the n nearest words to x. If n=NA (default), the semantic neighborhood will be determined according to a similarity threshold (see threshold)

threshold

specifies the similarity threshold that determines if a word is counted as a neighbor for x, following the method by Buchanan et al. (2011) (see Description below)

tvectors

the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)

Details

There are two principle approaches to determine the semantic neighborhood of a target word:

  • Set an a priori size of the semantic neighborhood to a fixed value n (e.g., Marelli & Baroni, 2015). The n closest words to the target word are counted as its semantic neighbors. The semantic neighborhood size is then necessarily n; the semantic neighborhood density is the mean similarity between these neighbors and the target word (see also plausibility)

  • Determine the semantic neighborhood based on a similarity threshold; all words whose similarity to the target word exceeds this threshold are counted as its semantic neighbors (e.g., Buchanan, Westbury, & Burgess, 2001). First, the similarity between the target word and all words in the semantic space is computed. These similarities are then transformed into z-scores. Traditionally, the threshold is set to z = 3.5 (e.g., Buchanan, Westbury, & Burgess, 2001).

If a single target word is used as x, this target word itself (which always has a similarity of 1 to itself) is excluded from these computations so that it cannot be counted as its own neighbor

Value

A list of three elements:

  • neighbors: A names numeric vector of all identified neighbors, with the names being these neighbors and the values their similarity to x

  • n_size: The number of neighbors as a numeric

  • SND: The semantic neighborhood density (SND) as a numeric

Author(s)

Fritz Guenther

References

Buchanan, L., Westbury, C., & Burgess, C. (2001). Characterizing semantic space: Neighborhood effects in word recognition. Psychonomic Bulletin & Review, 8, 531-544.

Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122, 485-515.

See Also

cosine, plot_neighbors, compose

Examples

data(wonderland)

SND("cheshire",n=20,tvectors=wonderland)

SND("alice",threshold=2,tvectors=wonderland)

LSAfun documentation built on Nov. 18, 2023, 1:10 a.m.