Description Usage Arguments Details Value References See Also Examples
Given a themetadata object, this function converts the OTU counts across samples into a document format and then fits a structural topic model by wrapping the stm function from package stm.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | find_topics(
themetadata_object,
K,
sigma_prior = 0,
model = NULL,
iters = 500,
tol = 1e-05,
batches = 1,
init_type = c("Spectral", "LDA", "Random"),
seed = themetadata_object$seed,
verbose = FALSE,
verbose_n = 5,
control = list()
)
|
themetadata_object |
(required) Ouput of |
K |
(required) A positive integer indicating the number of topics to be estimated. |
sigma_prior |
Scalar between 0 and 1. This sets the strength of regularization towards a diagonalized covariance matrix. Setting the value above 0 can be useful if topics are becoming too highly correlated. Defaults to 0. |
model |
Prefit STM model object to restart an existing model. |
iters |
Maximum number of EM iterations. Defaults to 500. |
tol |
Convergence tolerance. Defaults to 1e-5. |
batches |
Number of groups for memorized inference. Defaults to 1. |
init_type |
Type of initialization procedure. Defaults to Spectral |
seed |
Seed for the random number generator to reproduce previous results. |
verbose |
Logical flag to print progress information. Defaults to FALSE. |
verbose_n |
Integer determining the intervals at which labels are printed. |
control |
List of additional parameters control portions of the optimization. See details. |
Topics are estimated via stm from the stm package. The focus of the themetagenomics pipeline is leveraging both abundance and predicted functional information of 16S rRNA sequencing; hence, the pipeline calls for the use of only "prevalence" information (to use stm terminology). This wrapper therefore removes any options pertaining to "content." If the user is interested in exploring the content component of the STM, then the stm package itself is the ideal place to start. Given that only the prevalence component can be manipulated using find_topics, the following additional parameters can be passed to control as a list (adapted from stm documentation):
Scalara between 0 and 1 that controls the degree of L1 and L2 regularization, where 0 and 1 correspond to ridge and lasso regression. Defaults to 1.
Method to select the regularization parameter where 2 corresponds to AIC and log(n) is equivalent to BIC. Defaults to 2.
Maximum number of iterations for estimating prevalence. Defaults to 1000.
For LDA initialization, the number of Gibbs sampling iterations. Defaults to 50.
For LDA initialization, the number of burnin iterations. Defaults to 25.
For LDA initialization, the samples over topics distribution hyperparameter.
For LDA initialization, the topics over words distribution hyperparameter.
For spectral initialization, scalar between 0 and 1 that controls the degree sparsity of random projections. Defaults to .05
For spectral initialization, the dimensionality of random projections. Defaults to 3000.
For spectral initialization, the block size. Defaults to 2000.
For spectral initialization, the maximum number of words used during initialization.
An object of class topics containing
STM object containing topic model fit
Abundance table in document form of length equal to the number of samples. Each element contains 2-row array, where row 1 contains the the vocabulary index of a given taxon and row 2 contains its abundance in that document
Character vector containing vocabulary of taxa IDs, where their position corresponds to the document indexes
Original otu_table
Original tax_table
Original metadata
Original covariate references
Original modelframe
Original splineinfo
Roberts, M.E., Stewart, B.M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S.K., Albertson, B., & Rand, D.G. (2014). Structural topic models for open-ended survey responses. Am. J. Pol. Sci. 58, 1064–1082.
1 2 3 4 5 6 7 8 9 10 11 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.