project: Create a projection for text matching

Description Usage Arguments Details Value References Examples

View source: R/project.R

Description

Calculates the linear information in the word counts apart from the topics about the treatment.

Usage

1
2
3
4
5
6
7
project(
  stm_model,
  documents,
  interactions = TRUE,
  type = c("theta", "phi", "refit"),
  verbose = TRUE
)

Arguments

stm_model

the stm content covariate model from which to develop the projection

documents

the documents that we want the projection for. Note that these must be aligned with the vocabulary. If they were not the original documents used to fit the model, see alignCorpus.

interactions

a logical which defaults to TRUE. Determines whether or not the topic-aspect interactions are included.

type

determines how the topic-covariate interactions are included (see details below). If interactions=FALSE this has no effect.

verbose

a logical indicating if progress should be printed to the screen

Details

The function returns one loading per document, per level of the factor. Thus in the standard case of two levels (treatment/control) the projection is actually two-dimensional (indicating words that are particularly indicative of treatment and words particularly indicative of control).

When interactions=FALSE only the content covariate parameters are used (and not the topic-covariate interactions). This may often be a decent approximation to the full calculation because the topic-covariate interactions are typically very sparse.

When interactions=TRUE information from the interaction of the topic and the content covariate is included in the projections. The software offers three ways to do this based on the options set for type. In each case the difference is how we reweight the topic-specific components of the interaction.

When type="theta" (the option used in the paper), we simply use the theta values estimated under the model. When type="phi", we recompute the token-level topic loadings conditional on the document-topics theta. This allows individual words to have their own topic-specific loadings. When type="refit", we recompute the token-level topic loadings but under each different level of the content covariate. The option is called "refit" because it is essentially refitting the tokens under each different potential level of the content covariate when calculating the projection to that level.

Value

list

projection

the projection of word count information on the document

diagnostic

the sum of interaction projections for each word type normalized by the total number of words. This is helpful for assessing which elements of the vocabularly are contributing to the topic-specific elements of the projection. For non-topic specific parts, the relative contributions can be read directly off the kappa object in stm.

References

Roberts, M., Stewart, B., Nielsen, R. (2020) "Adjusting for Confounding with Text Matching." In American Journal of Political Science

Examples

1
2

textmatching documentation built on Aug. 19, 2020, 9:06 a.m.