View source: R/cosineSimilarity.R
| cosineSimilarity | R Documentation | 
Calculate cosine similarity between every row in matrix1 and every row in matrix2.
cosineSimilarity(matrix1, matrix2)
| matrix1 | a matrix of type  | 
| matrix2 | a matrix of type  | 
Cosine similarity is a measure of similarity between two vectors x and y that measures the cosine of the angle between them. Since we consider positive vectors, its maximal value is 1 if both vectors are identical and its minimal value is 0 if x \times y = 0.
The definition is: similarity = (x \times y) / (||x|| \times ||y||) = (\sum_i x_i \times y_i) / (\sqrt{(\sum_i x_i^2)} \times \sqrt{(\sum_i y_i^2)})
A dgCMatrix where element A[index1, index2] is the cosine similarity between matrix1[index1,] and matrix2[index2,].
Matrix
x <- c("Verkauf von Schreibwaren", "Verkauf", "Schreibwaren", "Industriemechaniker", "NOTINDOCUMENTTERMMATRIX")
(y <- c("Verkauf von B\xfcchern, Schreibwaren", "Fach\xe4rzin f\xfcr Kinder- und Jugendmedizin im \xf6ffentlichen Gesundheitswesen", "Industriemechaniker", "Dipl.-Ing. - Agrarwirtschaft (Landwirtschaft)"))
tok_fun = text2vec::word_tokenizer
it_train = text2vec::itoken(tolower(y), tokenizer = tok_fun, progressbar = FALSE)
vocab = text2vec::create_vocabulary(it_train)
vect.vocab = text2vec::vocab_vectorizer(vocab)
matrix1 <- asDocumentTermMatrix(x, vect.vocab = vect.vocab)$dtm
matrix2 <- asDocumentTermMatrix(y, vect.vocab = vect.vocab)$dtm
cosineSimilarity(matrix1, matrix1)
cosineSimilarity(matrix1, matrix2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.