find_topics: Perform topic estimation on a themetadata object
In EESI/themetagenomics: Exploring Thematic Structure and Predicted Functionality of 16s rRNA Amplicon Data

Description Usage Arguments Details Value References See Also Examples

Given a themetadata object, this function converts the OTU counts across samples into a document format and then fits a structural topic model by wrapping the stm function from package stm.

find_topics(
  themetadata_object,
  K,
  sigma_prior = 0,
  model = NULL,
  iters = 500,
  tol = 1e-05,
  batches = 1,
  init_type = c("Spectral", "LDA", "Random"),
  seed = themetadata_object$seed,
  verbose = FALSE,
  verbose_n = 5,
  control = list()
)

`themetadata_object`	(required) Ouput of `prepare_data`.
`K`	(required) A positive integer indicating the number of topics to be estimated.
`sigma_prior`	Scalar between 0 and 1. This sets the strength of regularization towards a diagonalized covariance matrix. Setting the value above 0 can be useful if topics are becoming too highly correlated. Defaults to 0.
`model`	Prefit STM model object to restart an existing model.
`iters`	Maximum number of EM iterations. Defaults to 500.
`tol`	Convergence tolerance. Defaults to 1e-5.
`batches`	Number of groups for memorized inference. Defaults to 1.
`init_type`	Type of initialization procedure. Defaults to Spectral
`seed`	Seed for the random number generator to reproduce previous results.
`verbose`	Logical flag to print progress information. Defaults to FALSE.
`verbose_n`	Integer determining the intervals at which labels are printed.
`control`	List of additional parameters control portions of the optimization. See details.

Topics are estimated via stm from the stm package. The focus of the themetagenomics pipeline is leveraging both abundance and predicted functional information of 16S rRNA sequencing; hence, the pipeline calls for the use of only "prevalence" information (to use stm terminology). This wrapper therefore removes any options pertaining to "content." If the user is interested in exploring the content component of the STM, then the stm package itself is the ideal place to start. Given that only the prevalence component can be manipulated using find_topics, the following additional parameters can be passed to control as a list (adapted from stm documentation):

gamma.enet: Scalara between 0 and 1 that controls the degree of L1 and L2 regularization, where 0 and 1 correspond to ridge and lasso regression. Defaults to 1.
gamma.ic.k: Method to select the regularization parameter where 2 corresponds to AIC and log(n) is equivalent to BIC. Defaults to 2.
gamma.maxits: Maximum number of iterations for estimating prevalence. Defaults to 1000.
nits: For LDA initialization, the number of Gibbs sampling iterations. Defaults to 50.
burnin: For LDA initialization, the number of burnin iterations. Defaults to 25.
alpha: For LDA initialization, the samples over topics distribution hyperparameter.
eta: For LDA initialization, the topics over words distribution hyperparameter.
rp.s: For spectral initialization, scalar between 0 and 1 that controls the degree sparsity of random projections. Defaults to .05
rp.p: For spectral initialization, the dimensionality of random projections. Defaults to 3000.
rp.d.group.size: For spectral initialization, the block size. Defaults to 2000.
maxV: For spectral initialization, the maximum number of words used during initialization.

An object of class topics containing

fit: STM object containing topic model fit
docs: Abundance table in document form of length equal to the number of samples. Each element contains 2-row array, where row 1 contains the the vocabulary index of a given taxon and row 2 contains its abundance in that document
vocab: Character vector containing vocabulary of taxa IDs, where their position corresponds to the document indexes
otu_table: Original otu_table
tax_table: Original tax_table
metadata: Original metadata
ref: Original covariate references
modelframe: Original modelframe
splineinfo: Original splineinfo

Roberts, M.E., Stewart, B.M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S.K., Albertson, B., & Rand, D.G. (2014). Structural topic models for open-ended survey responses. Am. J. Pol. Sci. 58, 1064–1082.

glmnet stm

formula <- ~DIAGNOSIS
refs <- 'Not IBD'

dat <- prepare_data(otu_table=GEVERS$OTU,rows_are_taxa=FALSE,tax_table=GEVERS$TAX,
                    metadata=GEVERS$META,formula=formula,refs=refs,
                    cn_normalize=TRUE,drop=TRUE)

## Not run: 
topics <- find_topics(dat,K=15)

## End(Not run)

EESI/themetagenomics documentation built on May 10, 2020, 1:40 a.m.

EESI/themetagenomics index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

EESI/themetagenomics
Exploring Thematic Structure and Predicted Functionality of 16s rRNA Amplicon Data

find_topics: Perform topic estimation on a themetadata object
In EESI/themetagenomics: Exploring Thematic Structure and Predicted Functionality of 16s rRNA Amplicon Data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to find_topics in EESI/themetagenomics...

R Package Documentation

Browse R Packages

We want your feedback!

EESI/themetagenomics Exploring Thematic Structure and Predicted Functionality of 16s rRNA Amplicon Data

find_topics: Perform topic estimation on a themetadata object In EESI/themetagenomics: Exploring Thematic Structure and Predicted Functionality of 16s rRNA Amplicon Data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to find_topics in EESI/themetagenomics...

R Package Documentation

Browse R Packages

We want your feedback!

EESI/themetagenomics
Exploring Thematic Structure and Predicted Functionality of 16s rRNA Amplicon Data

find_topics: Perform topic estimation on a themetadata object
In EESI/themetagenomics: Exploring Thematic Structure and Predicted Functionality of 16s rRNA Amplicon Data