ncs | R Documentation |
Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.
ncs(x, contexts_dem, contexts = NULL, N = 5, as_list = TRUE)
x |
a (quanteda) |
contexts_dem |
a |
contexts |
a (quanteda) |
N |
(numeric) number of nearest contexts to return |
as_list |
(logical) if FALSE all results are combined into a single data.frame If TRUE, a list of data.frames is returned with one data.frame per embedding |
a data.frame
or list of data.frames (one for each target)
with the following columns:
target
(character) rownames of x
,
the labels of the ALC embeddings. NA
if is.null(rownames(x))
.
context
(character) contexts collapsed into single documents (i.e. untokenized).
If contexts
is NULL then this variable will show the context (document) ids which
you can use to merge.
rank
(character) rank of context in terms of similarity with x
.
value
(numeric) cosine similarity between x
and context.
library(quanteda) # tokenize corpus toks <- tokens(cr_sample_corpus) # build a tokenized corpus of contexts sorrounding a target term immig_toks <- tokens_context(x = toks, pattern = "immigr*", window = 6L, rm_keyword = FALSE) # build document-feature matrix immig_dfm <- dfm(immig_toks) # construct document-embedding-matrix immig_dem <- dem(immig_dfm, pre_trained = cr_glove_subset, transform = TRUE, transform_matrix = cr_transform, verbose = FALSE) # to get group-specific embeddings, average within party immig_wv_party <- dem_group(immig_dem, groups = immig_dem@docvars$party) # find nearest contexts by party # setting as_list = FALSE combines each group's # results into a single data.frame (useful for joint plotting) ncs(x = immig_wv_party, contexts_dem = immig_dem, contexts = immig_toks, N = 5, as_list = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.