feature_sim: Given two feature-embedding-matrices, compute "parallel"...
In prodriguezsosa/conText: 'a la Carte' on Text (ConText) Embedding Regression

feature_sim

R Documentation

Given two feature-embedding-matrices, compute "parallel" cosine similarities between overlapping features.

Description

Efficient way of comparing two corpora along many features simultaneously.

Usage

feature_sim(x, y, features = character(0))

Arguments

`x`	a (`fem-class`) feature embedding matrix.
`y`	a (`fem-class`) feature embedding matrix.
`features`	(character) vector of features for which to compute similarity scores. If not defined then all overlapping features will be used.

Value

a data.frame with following columns:

feature: (character) overlapping features
value: (numeric) cosine similarity between overlapping features.

Examples


library(quanteda)

# tokenize corpus
toks <- tokens(cr_sample_corpus)

# create feature co-occurrence matrix for each party (set tri = FALSE to work with fem)
fcm_D <- fcm(toks[docvars(toks, 'party') == "D",],
context = "window", window = 6, count = "frequency", tri = FALSE)
fcm_R <- fcm(toks[docvars(toks, 'party') == "R",],
context = "window", window = 6, count = "frequency", tri = FALSE)

# compute feature-embedding matrix
fem_D <- fem(fcm_D, pre_trained = cr_glove_subset,
transform = TRUE, transform_matrix = cr_transform, verbose = FALSE)
fem_R <- fem(fcm_R, pre_trained = cr_glove_subset,
transform = TRUE, transform_matrix = cr_transform, verbose = FALSE)

# compare "horizontal" cosine similarity
feat_comp <- feature_sim(x = fem_R, y = fem_D)

prodriguezsosa/conText documentation built on April 23, 2024, 7:04 p.m.