cosineSimilarity: Cosine Similarity

View source: R/cosineSimilarity.R

cosineSimilarityR Documentation

Cosine Similarity

Description

Calculate cosine similarity between every row in matrix1 and every row in matrix2.

Usage

cosineSimilarity(matrix1, matrix2)

Arguments

matrix1

a matrix of type dgCMatrix.

matrix2

a matrix of type dgCMatrix.

Details

Cosine similarity is a measure of similarity between two vectors x and y that measures the cosine of the angle between them. Since we consider positive vectors, its maximal value is 1 if both vectors are identical and its minimal value is 0 if x \times y = 0.

The definition is: similarity = (x \times y) / (||x|| \times ||y||) = (\sum_i x_i \times y_i) / (\sqrt{(\sum_i x_i^2)} \times \sqrt{(\sum_i y_i^2)})

Value

A dgCMatrix where element A[index1, index2] is the cosine similarity between matrix1[index1,] and matrix2[index2,].

See Also

Matrix

Examples

x <- c("Verkauf von Schreibwaren", "Verkauf", "Schreibwaren", "Industriemechaniker", "NOTINDOCUMENTTERMMATRIX")
(y <- c("Verkauf von B\xfcchern, Schreibwaren", "Fach\xe4rzin f\xfcr Kinder- und Jugendmedizin im \xf6ffentlichen Gesundheitswesen", "Industriemechaniker", "Dipl.-Ing. - Agrarwirtschaft (Landwirtschaft)"))

tok_fun = text2vec::word_tokenizer
it_train = text2vec::itoken(tolower(y), tokenizer = tok_fun, progressbar = FALSE)
vocab = text2vec::create_vocabulary(it_train)
vect.vocab = text2vec::vocab_vectorizer(vocab)

matrix1 <- asDocumentTermMatrix(x, vect.vocab = vect.vocab)$dtm
matrix2 <- asDocumentTermMatrix(y, vect.vocab = vect.vocab)$dtm

cosineSimilarity(matrix1, matrix1)
cosineSimilarity(matrix1, matrix2)

malsch/occupationCoding documentation built on March 14, 2024, 8:09 a.m.