embedding_similarity: Cosine and Inner product based similarity
In ruimtehol: Learn Text 'Embeddings' with 'Starspace'

embedding_similarity

R Documentation

Cosine and Inner product based similarity

Description

Cosine and Inner product based similarity

Usage

embedding_similarity(x, y, type = c("cosine", "dot"), top_n = +Inf)

Arguments

`x`	a matrix with embeddings providing embeddings for words/n-grams/documents/labels as indicated in the rownames of the matrix
`y`	a matrix with embeddings providing embeddings for words/n-grams/documents/labels as indicated in the rownames of the matrix
`type`	either 'cosine' or 'dot'. If 'dot', returns inner-product based similarity, if 'cosine', returns cosine similarity
`top_n`	integer indicating to return only the top n most similar terms from `y` for each row of `x`. If `top_n` is supplied, a data.frame will be returned with only the highest similarities between `x` and `y` instead of all pairwise similarities

Value

By default, the function returns a similarity matrix between the rows of x and the rows of y. The similarity between row i of x and row j of y is found in cell [i, j] of the returned similarity matrix.
If top_n is provided, the return value is a data.frame with columns term1, term2, similarity and rank indicating the similarity between the provided terms in x and y ordered from high to low similarity and keeping only the top_n most similar records.

Examples

x <- matrix(rnorm(6), nrow = 2, ncol = 3)
rownames(x) <- c("word1", "word2")
y <- matrix(rnorm(15), nrow = 5, ncol = 3)
rownames(y) <- c("term1", "term2", "term3", "term4", "term5")

embedding_similarity(x, y, type = "cosine")
embedding_similarity(x, y, type = "dot")
embedding_similarity(x, y, type = "cosine", top_n = 1)
embedding_similarity(x, y, type = "dot", top_n = 1)
embedding_similarity(x, y, type = "cosine", top_n = 2)
embedding_similarity(x, y, type = "dot", top_n = 2)
embedding_similarity(x, y, type = "cosine", top_n = +Inf)
embedding_similarity(x, y, type = "dot", top_n = +Inf)

ruimtehol documentation built on May 29, 2024, 5:26 a.m.