feature_sim | R Documentation |
Efficient way of comparing two corpora along many features simultaneously.
feature_sim(x, y, features = character(0))
x |
a ( |
y |
a ( |
features |
(character) vector of features for which to compute similarity scores. If not defined then all overlapping features will be used. |
a data.frame
with following columns:
feature
(character) overlapping features
value
(numeric) cosine similarity between overlapping features.
library(quanteda) # tokenize corpus toks <- tokens(cr_sample_corpus) # create feature co-occurrence matrix for each party (set tri = FALSE to work with fem) fcm_D <- fcm(toks[docvars(toks, 'party') == "D",], context = "window", window = 6, count = "frequency", tri = FALSE) fcm_R <- fcm(toks[docvars(toks, 'party') == "R",], context = "window", window = 6, count = "frequency", tri = FALSE) # compute feature-embedding matrix fem_D <- fem(fcm_D, pre_trained = cr_glove_subset, transform = TRUE, transform_matrix = cr_transform, verbose = FALSE) fem_R <- fem(fcm_R, pre_trained = cr_glove_subset, transform = TRUE, transform_matrix = cr_transform, verbose = FALSE) # compare "horizontal" cosine similarity feat_comp <- feature_sim(x = fem_R, y = fem_D)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.