est.functions: Estimate predicted function-topic effects

Description Usage Arguments Details Value References See Also Examples

View source: R/estimate_function_effects.R

Description

Given within topic functional predictions, estimate the effects at a given gene function category level. The effects correspond to a topic-gene category interaction term after accounting for topic and gene category effects. The model can be fit via either maximum likelihood or Hamiltonian MC.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
## S3 method for class 'functions'
est(
  object,
  topics_subset,
  level = 2,
  method = c("hmc", "ml"),
  seed = object$seeds$next_seed,
  verbose = FALSE,
  ...
)

## S3 method for class 'hmc'
est(
  object,
  inits,
  prior = c("t", "normal", "laplace"),
  t_df = c(7, 7, 7),
  iters = 300,
  warmup = iters/2,
  chains = 1,
  cores = 1,
  seed = sample.int(.Machine$integer.max, 1),
  return_summary = TRUE,
  verbose = FALSE,
  ...
)

## S3 method for class 'ml'
est(
  object,
  iters = 1000,
  verbose = FALSE,
  seed = sample.int(.Machine$integer.max, 1),
  ...
)

Arguments

object

(required) Ouput of predict.topics.

topics_subset

Vector of topic indexes to be evaluated. Recommended to be < 25.

level

Gene category level to evaluate. Defaults to 2.

method

String indicating either ml or hmc. Defaults to hmc.

seed

Seed for the random number generator to reproduce previous results.

verbose

Logical flag to print progress information. Defaults to FALSE.

...

Additional arguments for methods.

inits

List of values for parameter initialization. If omitted, values are generated via glmer.nb

prior

Prior to be placed on covariate weights. Choices include student-t, normal, and laplace. Defaults to student-t.

t_df

Degrees of freedom for student-t priors. Defaults to 7.

iters

Number of iterations for for fitting. Defaults to 300 and 100 for HMC and ML, respectively.

warmup

For HMC, proportion of iterations devoted to warmup. Defaults to iters/2.

chains

For HMC, number of independent chains. Defaults to 1.

cores

For HMC, number of cores to parallelize chains. Defaults to 1.

return_summary

Logical flag to return results summary. Defaults to TRUE.

Details

The functional effects are estimated via a multilevel Bayesian negative binomial regression model. Topic and pathway level effects are estimated, as well as topic-pathway interactions. The model has the following form:

θ_{i} = μ + β_{w} + β_{k} + β_{w,k}

y_{i} ~ NB(θ_{i},φ)

where μ is the intercept and each β term represents the weight for pathway level, topic, and pathway level-topic interaction, respectively; φ represents the dispersion parameter.

HMC

Hamiltonian MC is performed via Stan. By default, student-t priors with degrees of freedom set at 7 are placed on all regression weights, with variance terms distributed by half normal priors. The intercept μ is given a normal prior with fixed variance. Lastly, φ is given an exponential(.5) prior. The priors placed on the regression weights can be changed by the user to either normal, t-family, or laplace (double exponential) priors if a sparse solution is desired. For the latter, each variance term is given an additional regularization parameter λ which in turn is distributed by a chi-squared(1) distribution.

Unless a set of initialization values are provided by the user, or the user chooses to select a random initialization procedure, initial values are set at the maximum likelihood estimate via glmer.nb, but at a far smaller number of iterations than had the user chosen ML as his or her estimation method.

ML

Maximum likelihood estimation is performed via glmer.nb. For deeper level functional categories, the model may fail to converge, even with a substantial number of iterations. In such a case, the model estimates are returns so the user can perform HMC, but by initializing at these ML values.

Value

An object of class effects containing

model

List containing the parameters, fit, and summary.

gene_table

Dataframe containing the formatted predicted gene information from predict.topics.

References

Bates, D., Maechler, M., Bolker, B., and Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

Gelman, A. and Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press; 1 edition.

Stan Development Team. 2016. RStan: the R interface to Stan. http://mc-stan.org

Stan Development Team. 2016. Stan Modeling Language Users Guide and Reference Manual, Version 2.14.0. http://mc-stan.org

See Also

glmer.nb stan resume

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
formula <- ~DIAGNOSIS
refs <- 'CD'

dat <- prepare_data(otu_table=GEVERS$OTU,rows_are_taxa=FALSE,tax_table=GEVERS$TAX,
                    metadata=GEVERS$META,formula=formula,refs=refs,
                    cn_normalize=TRUE,drop=TRUE)

## Not run: 
topics <- find_topics(dat,K=15)

functions <- predict(topics,reference_path='/references/ko_13_5_precalculated.tab.gz')
function_effects <- est(functions,level=3,
                        iters=500,method='hmc',
                        prior=c('laplace','t','laplace'))

## End(Not run)

EESI/themetagenomics documentation built on May 10, 2020, 1:40 a.m.