dem | R Documentation |
Given a document-feature-matrix, for each document,
multiply its feature counts (columns) with their
corresponding pretrained word embeddings and average
(usually referred to as averaged or additive document embeddings).
If specified and a transformation matrix is provided,
multiply the document embeddings by the transformation matrix
to obtain the corresponding a la carte
document embeddings.
(see eq 2: https://arxiv.org/pdf/1805.05388.pdf)
dem(x, pre_trained, transform = TRUE, transform_matrix, verbose = TRUE)
x |
a quanteda ( |
pre_trained |
(numeric) a F x D matrix corresponding to pretrained embeddings. F = number of features and D = embedding dimensions. rownames(pre_trained) = set of features for which there is a pre-trained embedding. |
transform |
(logical) if TRUE (default) apply the 'a la carte' transformation, if FALSE ouput untransformed averaged embeddings. |
transform_matrix |
(numeric) a D x D 'a la carte' transformation matrix. D = dimensions of pretrained embeddings. |
verbose |
(logical) - if TRUE, report the documents that had no overlapping features with the pretrained embeddings provided. |
a N x D (dem-class
) document-embedding-matrix corresponding to the ALC embeddings for each document.
N = number of documents (that could be embedded), D = dimensions of pretrained embeddings. This object
inherits the document variables in x
, the dfm used. These can be accessed calling the attribute: @docvars
.
Note, documents with no overlapping features with the pretrained embeddings provided are automatically
dropped. For a list of the documents that were embedded call the attribute: @Dimnames$docs
.
library(quanteda) # tokenize corpus toks <- tokens(cr_sample_corpus) # build a tokenized corpus of contexts sorrounding a target term immig_toks <- tokens_context(x = toks, pattern = "immigr*", window = 6L) # construct document-feature-matrix immig_dfm <- dfm(immig_toks) # construct document-embedding-matrix immig_dem <- dem(immig_dfm, pre_trained = cr_glove_subset, transform = TRUE, transform_matrix = cr_transform, verbose = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.