View source: R/get_grouped_similarity.R
get_grouped_similarity | R Documentation |
Get similarity scores between a target word or words and a comparison vector
of one candidate word or words. When two vectors of candidate words are
provided (second_vec
is not NULL
), the function calculates the cosine
similarity between a composite index of the two vectors. This is
operationalized as the mean similarity of the target word to the first
vector of terms plus negative one multiplied by the mean similarity to the
second vector of terms.
get_grouped_similarity(
x,
target,
first_vec,
second_vec,
pre_trained,
transform_matrix,
group_var,
window = window,
norm = "l2",
remove_punct = FALSE,
remove_symbols = FALSE,
remove_numbers = FALSE,
remove_separators = FALSE,
valuetype = "fixed",
hard_cut = FALSE,
case_insensitive = TRUE
)
x |
a (quanteda) |
target |
(character) vector of words |
first_vec |
(character) vector of words |
second_vec |
(character) vector of words |
pre_trained |
(numeric) a F x D matrix corresponding to pretrained embeddings,
usually trained on the same corpus as that used for |
transform_matrix |
(numeric) a D x D 'a la carte' transformation matrix. D = dimensions of pretrained embeddings. |
group_var |
(character) variable name in corpus object defining grouping variable |
window |
(numeric) - defines the size of a context (words around the target) |
norm |
(character) - "l2" for l2 normalized cosine similarity and "none" for dot product |
remove_punct |
(logical) - if |
remove_symbols |
(logical) - if |
remove_numbers |
(logical) - if |
remove_separators |
(logical) - if |
valuetype |
the type of pattern matching: |
hard_cut |
(logical) - if TRUE then a context must have |
case_insensitive |
(logical) - if |
a data.frame
with the following columns:
group
the grouping variable specified for the analysis
val
(numeric) cosine similarity scores
quanteda::docvars(cr_sample_corpus, 'year') <- rep(2011:2014, each = 50)
cos_simsdf <- get_grouped_similarity(cr_sample_corpus,
group_var = "year",
target = "immigration",
first_vec = c("left", "lefty"),
second_vec = c("right", "rightwinger"),
pre_trained = cr_glove_subset,
transform_matrix = cr_transform,
window = 12L,
norm = "l2")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.