est.topics: Estimate topic effects

Description Usage Arguments Details Value References See Also Examples

View source: R/estimate_topic_effects.R

Description

Given a covariate of interest, measure its relationship with the samples over topics distribution from the STM.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## S3 method for class 'topics'
est(
  object,
  metadata,
  formula,
  refs,
  nsims = 100,
  ui_level = 0.8,
  npoints = 100,
  seed = object$seeds$next_seed,
  verbose = FALSE,
  ...
)

Arguments

object

(required) Ouput of find_topics.

metadata

Matrix or dataframe containing sample information with row or column names corresponding to the otu_table.

formula

New formula for covariates of interest found in metadata, different than the formula used to generate object. Interactions, transformations, splines, and polynomial expansions are permitted.

refs

Character vector of length equal to the number of factors or binary covariates in formula, indicating the reference level.

nsims

Number of simulations to perform for estimating covariate effects. Defaults to 100.

ui_level

Width of uncertainty interval for reporting effects. Defaults to .95.

npoints

Number of posterior predictive samples to draw. Defaults to 100.

seed

Seed for the random number generator to reproduce previous results.

verbose

Logical flag to print progress information. Defaults to FALSE.

...

Additional arguments for methods.

Details

The posterior predictive estimates are calculated depending on the type of covariate. First, all factors are expanded using dummy variables, setting the reference classes as intercepts. For each topic, the topic frequency over samples is regressed against the expanded design matrix. Covariate weights and the variance-covariance matrix is then calculated, which are used to sample new weights using a multivariate normal distribution.

The estimation of a specific covariate effect is performed by calculated y-hat from the posterior predictive distribution by holding all covariates other than the target covariate fixed. This is accomplished by marginalizing over the sample data. This fixed design matrix is then multiplied by the weights simulated from the multivariate normal distribution. For a target binary covariate x (which includes expanded factors), effect estimates are defined as the difference between y-hat when x=1 and y-hat when x=0 is calculated, with the reference covariate designated as 1 (hence negative differences imply a strong effect for the reference class). For continuous covariates, the effect estimates are defined as the regression weight for that covariate of interest. To explore the posterior predictive distribution, y-hat is again calculated, but over a vector of values spanning the range of the continuous covariate, with other covariates held fixed as before. Additional y-hat are then calculated while iteratively setting each binary covariate to 0, to explore their influence on the continuous covariate. Nonlinear covariates (e.g., splines) are treated similarly with respect to y-hat. Their effect estimates, however, are calculated by calculating the Spearman rank correlation coefficient between y-hat and y.

For each covariate, the effect estimate is returned. y-hat vectors are returned as well for continuous and nonlinear covariates. All effect estimates are ranked in terms of weight or correlation coefficient. Values not overlapping 0 given a user designed level of uncertainty or returned as "significant."

Value

An object of class effects containing

topic_effects

List of the effect estimates for the covariates in formula.

topics

Object of class topics containing the original output of find_topics.

modelframe

Original modelframe.

References

Gelman, A. and Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press; 1 edition.

Roberts, M.E., Stewart, B.M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S.K., Albertson, B., & Rand, D.G. (2014). Structural topic models for open-ended survey responses. Am. J. Pol. Sci. 58, 1064–1082.

See Also

estimateEffect

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
formula <- ~DIAGNOSIS
refs <- 'CD'

dat <- prepare_data(otu_table=GEVERS$OTU,rows_are_taxa=FALSE,tax_table=GEVERS$TAX,
                    metadata=GEVERS$META,formula=formula,refs=refs,
                    cn_normalize=TRUE,drop=TRUE)

## Not run: 
topics <- find_topics(dat,K=15)
topic_effects <- est(topics)

## End(Not run)

themetagenomics documentation built on May 1, 2020, 1:06 a.m.