View source: R/plot_nns_ratio.R
plot_nns_ratio | R Documentation |
get_nns_ratio()
A way of visualizing the top nearest neighbors of a pair of ALC embeddings that captures how "discriminant" each feature is of each embedding (group).
plot_nns_ratio(x, alpha = 0.01, horizontal = TRUE)
x |
output of get_nns_ratio |
alpha |
(numerical) betwee 0 and 1. Significance threshold to identify significant values.
These are denoted by a |
horizontal |
(logical) defines the type of plot. if TRUE results are plotted on 1 dimension. If FALSE, results are plotted on 2 dimensions, with the second dimension catpuring the ranking of cosine ratio similarties. |
a ggplot-class
object.
library(ggplot2)
library(quanteda)
# tokenize corpus
toks <- tokens(cr_sample_corpus)
# build a tokenized corpus of contexts sorrounding a target term
immig_toks <- tokens_context(x = toks, pattern = "immigration", window = 6L)
# sample 100 instances of the target term, stratifying by party (only for example purposes)
set.seed(2022L)
immig_toks <- tokens_sample(immig_toks, size = 100, by = docvars(immig_toks, 'party'))
# we limit candidates to features in our corpus
feats <- featnames(dfm(immig_toks))
# compute ratio
set.seed(2022L)
immig_nns_ratio <- get_nns_ratio(x = immig_toks,
N = 10,
groups = docvars(immig_toks, 'party'),
numerator = "R",
candidates = feats,
pre_trained = cr_glove_subset,
transform = TRUE,
transform_matrix = cr_transform,
bootstrap = TRUE,
# num_bootstraps should be at least 100,
# we use 10 here due to CRAN-imposed constraints
# on example execution time
num_bootstraps = 100,
permute = FALSE,
num_permutations = 10,
verbose = FALSE)
plot_nns_ratio(x = immig_nns_ratio, alpha = 0.01, horizontal = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.