View source: R/get_seq_cos_sim.R
get_seq_cos_sim | R Documentation |
Calculate cosine similarities between target word and candidates words over sequenced variable using ALC embedding approach
get_seq_cos_sim( x, seqvar, target, candidates, pre_trained, transform_matrix, window = 6, valuetype = "fixed", case_insensitive = TRUE, hard_cut = FALSE, verbose = TRUE )
x |
(character) vector - this is the set of documents (corpus) of interest |
seqvar |
ordered variable such as list of dates or ordered iseology scores |
target |
(character) vector - target word |
candidates |
(character) vector of features of interest |
pre_trained |
(numeric) a F x D matrix corresponding to pretrained embeddings. F = number of features and D = embedding dimensions. rownames(pre_trained) = set of features for which there is a pre-trained embedding. |
transform_matrix |
(numeric) a D x D 'a la carte' transformation matrix. D = dimensions of pretrained embeddings. |
window |
(numeric) - defines the size of a context (words around the target). |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
hard_cut |
(logical) - if TRUE then a context must have |
verbose |
(logical) - if TRUE, report the total number of target instances found. |
a data.frame with one column for each candidate term with corresponding cosine similarity values and one column for seqvar.
library(quanteda) # gen sequence var (here: year) docvars(cr_sample_corpus, 'year') <- rep(2011:2014, each = 50) cos_simsdf <- get_seq_cos_sim(x = cr_sample_corpus, seqvar = docvars(cr_sample_corpus, 'year'), target = "equal", candidates = c("immigration", "immigrants"), pre_trained = cr_glove_subset, transform_matrix = cr_transform)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.