est.functions: Estimate predicted function-topic effects
In EESI/themetagenomics: Exploring Thematic Structure and Predicted Functionality of 16s rRNA Amplicon Data

Description Usage Arguments Details Value References See Also Examples

View source: R/estimate_function_effects.R

Given within topic functional predictions, estimate the effects at a given gene function category level. The effects correspond to a topic-gene category interaction term after accounting for topic and gene category effects. The model can be fit via either maximum likelihood or Hamiltonian MC.

## S3 method for class 'functions'
est(
  object,
  topics_subset,
  level = 2,
  method = c("hmc", "ml"),
  seed = object$seeds$next_seed,
  verbose = FALSE,
  ...
)

## S3 method for class 'hmc'
est(
  object,
  inits,
  prior = c("t", "normal", "laplace"),
  t_df = c(7, 7, 7),
  iters = 300,
  warmup = iters/2,
  chains = 1,
  cores = 1,
  seed = sample.int(.Machine$integer.max, 1),
  return_summary = TRUE,
  verbose = FALSE,
  ...
)

## S3 method for class 'ml'
est(
  object,
  iters = 1000,
  verbose = FALSE,
  seed = sample.int(.Machine$integer.max, 1),
  ...
)

`object`	(required) Ouput of `predict.topics`.
`topics_subset`	Vector of topic indexes to be evaluated. Recommended to be < 25.
`level`	Gene category level to evaluate. Defaults to 2.
`method`	String indicating either ml or hmc. Defaults to hmc.
`seed`	Seed for the random number generator to reproduce previous results.
`verbose`	Logical flag to print progress information. Defaults to FALSE.
`...`	Additional arguments for methods.
`inits`	List of values for parameter initialization. If omitted, values are generated via `glmer.nb`
`prior`	Prior to be placed on covariate weights. Choices include student-t, normal, and laplace. Defaults to student-t.
`t_df`	Degrees of freedom for student-t priors. Defaults to 7.
`iters`	Number of iterations for for fitting. Defaults to 300 and 100 for HMC and ML, respectively.
`warmup`	For HMC, proportion of iterations devoted to warmup. Defaults to iters/2.
`chains`	For HMC, number of independent chains. Defaults to 1.
`cores`	For HMC, number of cores to parallelize chains. Defaults to 1.
`return_summary`	Logical flag to return results summary. Defaults to TRUE.

The functional effects are estimated via a multilevel Bayesian negative binomial regression model. Topic and pathway level effects are estimated, as well as topic-pathway interactions. The model has the following form:

θ_{i} = μ + β_{w} + β_{k} + β_{w,k}

y_{i} ~ NB(θ_{i},φ)

where μ is the intercept and each β term represents the weight for pathway level, topic, and pathway level-topic interaction, respectively; φ represents the dispersion parameter.

HMC

Hamiltonian MC is performed via Stan. By default, student-t priors with degrees of freedom set at 7 are placed on all regression weights, with variance terms distributed by half normal priors. The intercept μ is given a normal prior with fixed variance. Lastly, φ is given an exponential(.5) prior. The priors placed on the regression weights can be changed by the user to either normal, t-family, or laplace (double exponential) priors if a sparse solution is desired. For the latter, each variance term is given an additional regularization parameter λ which in turn is distributed by a chi-squared(1) distribution.

Unless a set of initialization values are provided by the user, or the user chooses to select a random initialization procedure, initial values are set at the maximum likelihood estimate via glmer.nb, but at a far smaller number of iterations than had the user chosen ML as his or her estimation method.

ML

Maximum likelihood estimation is performed via glmer.nb. For deeper level functional categories, the model may fail to converge, even with a substantial number of iterations. In such a case, the model estimates are returns so the user can perform HMC, but by initializing at these ML values.

An object of class effects containing

model: List containing the parameters, fit, and summary.
gene_table: Dataframe containing the formatted predicted gene information from predict.topics.

Bates, D., Maechler, M., Bolker, B., and Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

Gelman, A. and Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press; 1 edition.

Stan Development Team. 2016. RStan: the R interface to Stan. http://mc-stan.org

Stan Development Team. 2016. Stan Modeling Language Users Guide and Reference Manual, Version 2.14.0. http://mc-stan.org

glmer.nb stan resume

formula <- ~DIAGNOSIS
refs <- 'CD'

dat <- prepare_data(otu_table=GEVERS$OTU,rows_are_taxa=FALSE,tax_table=GEVERS$TAX,
                    metadata=GEVERS$META,formula=formula,refs=refs,
                    cn_normalize=TRUE,drop=TRUE)

## Not run: 
topics <- find_topics(dat,K=15)

functions <- predict(topics,reference_path='/references/ko_13_5_precalculated.tab.gz')
function_effects <- est(functions,level=3,
                        iters=500,method='hmc',
                        prior=c('laplace','t','laplace'))

## End(Not run)