cs: Cosine Similarity

Description Usage Arguments Details Value References See Also Examples

View source: R/cs.R

Description

This function finds the cosine similarity between two vectors of words.

Usage

1
cs(a, b, word_embeddings)

Arguments

a, b

characters or character vectors containing words in word_embeddings.

word_embeddings

named list of word embeddings. See formatWordEmbeddings.

Details

Consider 2 words with word embedding representations a and b. Then the cosine similarity is defined as

sim_cos(a,b)=(a \cdot b)/(|| a ||_2 \cdot || b ||_2)

If A = (a_1,...,a_n) and B = (b_1,...,b_m), then the result is a matrix of m \times n dimension with each entry in cell (i, j) defined as sim_cos(a_j, b_i).

Value

a matrix of cosine similarities

References

Goldberg, Y. (2017) Neural Network Methods for Natural Language Processing. San Rafael, CA: Morgan & Claypool Publishers.

See Also

formatWordEmbeddings

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Not run: 

word_embeddings <- formatWordEmbeddings(embedding_matrix_example, normalize = TRUE)

a <- "home"
b <- "house"
cs(a, b, word_embeddings)

a <- c("home", "apartment", "mansion")
b <- c("my", "dog", "sleeps", "in", "her", "dog", "house")
cs(a, b, word_embeddings)

## End(Not run)

scottmanski/TAGAM documentation built on Aug. 3, 2020, 10:50 a.m.