neighbors: Find nearest neighbors

View source: R/neighbors.r

neighborsR Documentation

Find nearest neighbors

Description

Returns the n nearest words to a given word or sentence/document

Usage

neighbors(x,n,tvectors=tvectors)

Arguments

x

a character vector of length(x) = 1 or a numeric of length=ncol(tvectors) vector with same dimensionality as the semantic space

n

the number of neighbors to be computed

tvectors

the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)

Details

The format of x should be of the kind x <- "word1 word2 word3" instead of

x <- c("word1", "word2", "word3") if sentences/documents are used as input. This allows for simple copy&paste-inserting of text.

To import a document Document.txt to from a directory for comparisons, set your working directory to this directory using setwd(). Then use the following command lines:

fileName1 <- "Alice_in_Wonderland.txt"

x <- readChar(fileName1, file.info(fileName1)$size).

Since x can also be chosen to be any vector of the active LSA Space, this function can be combined with compose() to compute neighbors of complex expressions (see examples)

Value

A named numeric vector. The neighbors are given as names of the vector, and their respective cosines to the input as vector entries.

Author(s)

Fritz Guenther

References

Landauer, T.K., & Dumais, S.T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211-240.

Dennis, S. (2007). How to use the LSA Web Site. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of Latent Semantic Analysis (pp. 35-56). Mahwah, NJ: Erlbaum.

http://wordvec.colorado.edu/

See Also

cosine, plot_neighbors, compose

Examples

data(wonderland)

neighbors("cheshire",n=20,tvectors=wonderland) 

neighbors(compose("mad","hatter",method="Add",tvectors=wonderland),
n=20,tvectors=wonderland)

LSAfun documentation built on Nov. 18, 2023, 1:10 a.m.