Description Usage Arguments Details Value References See Also Examples
Creates the design matrix of cosine similarities from textual observations and a vector of words.
1 2 3 4 5 6 7 8 9 |
x |
a tibble containing 2 columns; line and word. The 'line' column contains the observation number that the word from the 'word' column appears in. See 'Examples'. |
words |
a character vector of words that will represent the columns of the resulting matrix. |
word_embeddings |
named list of word embeddings. See |
method |
function to apply across each column. Options include |
parallel |
logical, indicating if the matrix should be calculated in parallel. |
n.cluster |
integer, the number of clusters to use if |
sparse |
logical, indicating if a sparse matrix should be returned. |
A function to create a design matrix of cosine similarities from textual observations and a vector of words. The resulting matrix will be of dimension unique(x$line) \times length(words).
Consider 2 words with word embedding representations a and b. Then the cosine similarity is defined as
sim_cos(a,b)=(a \cdot b)/(|| a ||_2 \cdot || b ||_2)
.
If method = "max"
, for a given line with m words, each row of the returned matrix is defined as max_{i=1,...,m}(sim_cos(a_j, b_i)).
method = "sum"
or method = "mean"
are defined
in a similar fashion.
a (sparse) matrix of cosine similarities
Goldberg, Y. (2017) Neural Network Methods for Natural Language Processing. San Rafael, CA: Morgan & Claypool Publishers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ## Not run:
require(dplyr)
require(tidytext)
word_embeddings <- formatWordEmbeddings(embedding_matrix_example, normalize = TRUE)
sentences <- data.frame("Description" = c("Statistics is great!",
"My dog is fluffy.",
"What is your favorite class?"),
stringsAsFactors = FALSE)
x <- tibble(line = 1:nrow(sentences), text = sentences$Description) %>%
unnest_tokens(word, text)
cs.matrix(x, words = c("stats", "cat"), word_embeddings)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.