paragraph2vec_similarity: Similarity between document / word vectors as used in...

Description Usage Arguments Value See Also Examples

View source: R/paragraph2vec.R

Description

The similarity between document / word vectors is defined as the inner product of the vector elements

Usage

1
paragraph2vec_similarity(x, y, top_n = +Inf)

Arguments

x

a matrix with embeddings where the rownames of the matrix provide the label of the term

y

a matrix with embeddings where the rownames of the matrix provide the label of the term

top_n

integer indicating to return only the top n most similar terms from y for each row of x. If top_n is supplied, a data.frame will be returned with only the highest similarities between x and y instead of all pairwise similarities

Value

By default, the function returns a similarity matrix between the rows of x and the rows of y. The similarity between row i of x and row j of y is found in cell [i, j] of the returned similarity matrix.
If top_n is provided, the return value is a data.frame with columns term1, term2, similarity and rank indicating the similarity between the provided terms in x and y ordered from high to low similarity and keeping only the top_n most similar records.

See Also

paragraph2vec

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
x <- matrix(rnorm(6), nrow = 2, ncol = 3)
rownames(x) <- c("word1", "word2")
y <- matrix(rnorm(15), nrow = 5, ncol = 3)
rownames(y) <- c("doc1", "doc2", "doc3", "doc4", "doc5")

paragraph2vec_similarity(x, y)
paragraph2vec_similarity(x, y, top_n = 1)
paragraph2vec_similarity(x, y, top_n = 2)
paragraph2vec_similarity(x, y, top_n = +Inf)
paragraph2vec_similarity(y, y)
paragraph2vec_similarity(y, y, top_n = 1)
paragraph2vec_similarity(y, y, top_n = 2)
paragraph2vec_similarity(y, y, top_n = +Inf)

doc2vec documentation built on March 28, 2021, 1:09 a.m.