conText | R Documentation |
Estimates an embedding regression model with options to use bootstrapping (to be deprecated) or jackknife debiasing to estimate confidence intervals and a permutation test for inference (see https://github.com/prodriguezsosa/conText for details.)
conText(
formula,
data,
pre_trained,
transform = TRUE,
transform_matrix,
bootstrap = FALSE,
num_bootstraps = 100,
stratify = FALSE,
jackknife = TRUE,
confidence_level = 0.95,
permute = TRUE,
num_permutations = 100,
window = 6L,
valuetype = c("glob", "regex", "fixed"),
case_insensitive = TRUE,
hard_cut = FALSE,
verbose = TRUE
)
formula |
a symbolic description of the model to be fitted with a target word as a DV e.g.
|
data |
a quanteda |
pre_trained |
(numeric) a F x D matrix corresponding to pretrained embeddings. F = number of features and D = embedding dimensions. rownames(pre_trained) = set of features for which there is a pre-trained embedding. |
transform |
(logical) if TRUE (default) apply the 'a la carte' transformation, if FALSE ouput untransformed averaged embeddings. |
transform_matrix |
(numeric) a D x D 'a la carte' transformation matrix. D = dimensions of pretrained embeddings. |
bootstrap |
(logical) if TRUE, use bootstrapping – sample from texts with replacement and re-run regression on each sample. |
num_bootstraps |
(numeric) number of bootstraps to use (at least 100). Ignored if bootstrap = FALSE. |
stratify |
(logical) if TRUE, stratify by discrete covariates when bootstrapping. |
jackknife |
(logical) if TRUE (default), jackknife (leave one out) debiasing is implemented. Implies n resamples. |
confidence_level |
(numeric in (0,1)) confidence level e.g. 0.95 |
permute |
(logical) if TRUE, compute empirical p-values using permutation test |
num_permutations |
(numeric) number of permutations to use |
window |
the number of context words to be displayed around the keyword |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
hard_cut |
(logical) - if TRUE then a context must have |
verbose |
(logical) - if TRUE, report the documents that had no overlapping features with the pretrained embeddings provided. |
a conText-class
object - a D x M matrix with D = dimensions
of the pre-trained feature embeddings provided and M = number of covariates
including the intercept. These represent the estimated regression coefficients.
These can be combined to compute ALC embeddings for different combinations of covariates.
The object also includes various informative attributes, importantly
a data.frame
with the following columns:
coefficient
(character) name of (covariate) coefficient.
value
(numeric) norm of the corresponding beta coefficient (debiased if jackknife = TRUE).
std.error
(numeric) (if bootstrap = TRUE or jackknife = TRUE) std. error of the (debiased if jackknife = TRUE) norm of the beta coefficient.
lower.ci
(numeric) (if bootstrap = TRUE or jackknife = TRUE) lower bound of the (debiased if jackknife = TRUE) confidence interval.
upper.ci
(numeric) (if bootstrap = TRUE or jackknife = TRUE) upper bound of the (debiased if jackknife = TRUE) confidence interval.
p.value
(numeric) (if permute = TRUE) empirical p.value of the norm of the coefficient.
library(quanteda)
# tokenize corpus
toks <- tokens(cr_sample_corpus)
## given the target word "immigration"
set.seed(2021L)
model1 <- conText(formula = immigration ~ party + gender,
data = toks,
pre_trained = cr_glove_subset,
transform = TRUE,
transform_matrix = cr_transform,
bootstrap=FALSE,
jackknife = TRUE,
confidence_level = 0.95,
permute = TRUE,
num_permutations = 10,
window = 6,
case_insensitive = TRUE,
verbose = FALSE)
# notice, character/factor covariates are automatically "dummified"
rownames(model1)
# the beta coefficient 'partyR' in this case corresponds to the alc embedding
# of "immigration" for Republican party speeches
# (normed) coefficient table
model1@normed_coefficients
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.