Description Usage Arguments Details Value References Examples
Calculates the linear information in the word counts apart from the topics about the treatment.
1 2 3 4 5 6 7 |
stm_model |
the stm content covariate model from which to develop the projection |
documents |
the documents that we want the projection for. Note that these must be aligned with the vocabulary. If they were not the original documents used to fit the model, see alignCorpus. |
interactions |
a logical which defaults to TRUE. Determines whether or not the topic-aspect interactions are included. |
type |
determines how the topic-covariate interactions are included (see details below). If interactions=FALSE this has no effect. |
verbose |
a logical indicating if progress should be printed to the screen |
The function returns one loading per document, per level of the factor. Thus in the standard case of two levels (treatment/control) the projection is actually two-dimensional (indicating words that are particularly indicative of treatment and words particularly indicative of control).
When interactions=FALSE
only the content covariate parameters
are used (and not the topic-covariate interactions). This may often
be a decent approximation to the full calculation because the topic-covariate
interactions are typically very sparse.
When interactions=TRUE
information from the interaction of the topic
and the content covariate is included in the projections. The software offers
three ways to do this based on the options set for type
. In each case
the difference is how we reweight the topic-specific components of the interaction.
When type="theta"
(the option used in the paper), we simply use the theta
values estimated under the model. When type="phi"
, we recompute the token-level
topic loadings conditional on the document-topics theta. This allows individual words
to have their own topic-specific loadings. When type="refit"
, we recompute
the token-level topic loadings but under each different level of the content covariate.
The option is called "refit"
because it is essentially refitting the tokens under
each different potential level of the content covariate when calculating the projection to
that level.
list
projection |
the projection of word count information on the document |
diagnostic |
the sum of interaction projections for each word type normalized by the total number of words. This is helpful for assessing which elements of the vocabularly are contributing to the topic-specific elements of the projection. For non-topic specific parts, the relative contributions can be read directly off the kappa object in stm. |
Roberts, M., Stewart, B., Nielsen, R. (2020) "Adjusting for Confounding with Text Matching." In American Journal of Political Science
1 2 | data(sim)
projection <- project(sim_topics,sim_documents)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.