View source: R/cosineSimilarity.R
cosineSimilarity | R Documentation |
Calculate cosine similarity between every row in matrix1
and every row in matrix2
.
cosineSimilarity(matrix1, matrix2)
matrix1 |
a matrix of type |
matrix2 |
a matrix of type |
Cosine similarity is a measure of similarity between two vectors x
and y
that measures the cosine of the angle between them. Since we consider positive vectors, its maximal value is 1 if both vectors are identical and its minimal value is 0 if x \times y = 0
.
The definition is: similarity = (x \times y) / (||x|| \times ||y||) = (\sum_i x_i \times y_i) / (\sqrt{(\sum_i x_i^2)} \times \sqrt{(\sum_i y_i^2)})
A dgCMatrix
where element A[index1, index2]
is the cosine similarity between matrix1[index1,]
and matrix2[index2,]
.
Matrix
x <- c("Verkauf von Schreibwaren", "Verkauf", "Schreibwaren", "Industriemechaniker", "NOTINDOCUMENTTERMMATRIX")
(y <- c("Verkauf von B\xfcchern, Schreibwaren", "Fach\xe4rzin f\xfcr Kinder- und Jugendmedizin im \xf6ffentlichen Gesundheitswesen", "Industriemechaniker", "Dipl.-Ing. - Agrarwirtschaft (Landwirtschaft)"))
tok_fun = text2vec::word_tokenizer
it_train = text2vec::itoken(tolower(y), tokenizer = tok_fun, progressbar = FALSE)
vocab = text2vec::create_vocabulary(it_train)
vect.vocab = text2vec::vocab_vectorizer(vocab)
matrix1 <- asDocumentTermMatrix(x, vect.vocab = vect.vocab)$dtm
matrix2 <- asDocumentTermMatrix(y, vect.vocab = vect.vocab)$dtm
cosineSimilarity(matrix1, matrix1)
cosineSimilarity(matrix1, matrix2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.