compute.lda: LDA model inference
In cellTree: Inference and visualisation of Single-Cell RNA-seq data as a hierarchical tree structure

Description Usage Arguments Details Value References See Also Examples

This function fits a Latent Dirichlet Allocation (LDA) to single-cell RNA-seq data.

1
2
3

compute.lda(data, method = "maptpx", k.topics = if (method == "maptpx") 2:15
  else 4, log.scale = TRUE, sd.filter = 0.5, tot.iter = if (method ==
  "Gibbs") 200 else 1e+06, tol = if (method == "maptpx") 0.05 else 10^-5)

`data`	A matrix of (non-negative) RNA-seq expression levels where each row is a gene and each column is the cell sequenced.
`method`	LDA inference method to use. Can be any unique prefix of ‘maptpx’, ‘Gibbs’ or ‘VEM’ (defaults to ‘maptpx’)
`k.topics`	Integer (optional). Number of topics to fit in the model. If `method` is ‘maptpx’, `k.topics` can be a vector of possible topic numbers and the the best model (evaluated on Bayes factor vs a null single topic model) will be returned.
`log.scale`	Boolean (optional). Whether the data should be log-scaled.
`sd.filter`	Numeric or `FALSE` (optional). Standard-deviation threshold below which genes should be removed from the data (no filtering if set to `FALSE`).
`tot.iter, tol`	Numeric parameters (optional) forwarded to the chosen LDA inference method's contol class.

Latent Dirichlet allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups (topics) that explain why some parts of the data are similar [Blei, 2003]. Each topic is modelled as a (Dirichlet) distribution over observations and each set of observations is also modelled as a (Dirichlet) distribution over topics. In lieu of the traditional NLP context of word occurence counts in documents, our model uses RNA-seq observation counts in single cells. Three separate LDA inference methods can be used at the moment:

Gibbs uses Collapsed Gibbs Sampling method (implemented by Xuan-Hieu Phan and co-authors in the topicmodels package [Phan, 2008]) to infer the parameters of the Dirichlet distributions for a given number of topics. It gives high accuracy but is very time-consuming to run on large number of cells and genes.
VEM uses Variational Expectation-Maximisation (as described in [Hoffman, 2010]). This method tends to converge faster than Gibbs collapsed sampling, albeit with lower accuracy.
maptpx uses the method described in [Taddy, 2011] and implemented in package maptpx to estimate the parameters of the topic model for increasing number of topics (using previous estimates as a starting point for larger topic numbers). The best model (/number of topics) is selected based on Bayes factor over the Null model. Although potentially less accurate, this method provides the fastest way to train and select from a large number of models, when the number of topics is not well known.

When in doubt, the function can be ran with its default parameter values and should produce a usable LDA model in reasonable time (using the ‘maptpx’ inference method). The model can be further refined for a specific number of topics with slower methods. While larger models (using large number of topics) might fit the data well, there is a high risk of overfitting and it is recommended to use the smallest possible number of topics that still explains the observations well. Anecdotally, a typical number of topics for cell differentiation data (from pluripotent to fully specialised) would seem to be around 4 or 5.

A LDA model fitted for data, of class LDA-class (for methods 'Gibbs' or 'VEM') or topics (for 'maptpx')

Blei, Ng, and Jordan. “Latent dirichlet allocation.” the Journal of machine Learning research 3 (2003): 993-1022.
Hoffman, Blei and Bach (2010). “Online Learning for Latent Dirichlet Allocation.” In J Lafferty, CKI Williams, J Shawe-Taylor, R Zemel, A Culotta (eds.), Advances in Neural Information Processing Systems 23, pp. 856-864. MIT Press, Cambridge, MA.
Hornik and Gr<c3><bc>n. “topicmodels: An R package for fitting topic models.” Journal of Statistical Software 40.13 (2011): 1-30.
Phan, Nguyen and Horiguchi. “Learning to classify short and sparse text & web with hidden topics from large-scale data collections.” Proceedings of the 17th international conference on World Wide Web. ACM, 2008.
Taddy. “On estimation and selection for topic models.” arXiv preprint arXiv:1109.4518 (2011).

LDA, topics, LDA_Gibbscontrol-class, CTM_VEMcontrol-class

# Load skeletal myoblast RNA-Seq data from HSMMSingleCell package:
library(HSMMSingleCell)
data(HSMM_expr_matrix)

# Run LDA inference using 'maptpx' method for k = 4:
 lda.results = compute.lda(HSMM_expr_matrix, k.topics=4, method="maptpx")


# Run LDA inference using 'maptpx' method for number of topics k = 3 to 6:
 lda.results = compute.lda(HSMM_expr_matrix, k.topics=3:6, method="maptpx")

# Run LDA inference using 'Gibbs' [collapsed sampling] method for number of k = 4 topics:
 lda.results = compute.lda(HSMM_expr_matrix, k.topics=4, method="Gibbs")

cellTree documentation built on Nov. 8, 2020, 5:05 p.m.

cellTree index

Package overview Inference and visualisation of Single-Cell RNA-seq Data data as a hierarchical tree structure

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cellTree
Inference and visualisation of Single-Cell RNA-seq data as a hierarchical tree structure

compute.lda: LDA model inference
In cellTree: Inference and visualisation of Single-Cell RNA-seq data as a hierarchical tree structure

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to compute.lda in cellTree...

R Package Documentation

Browse R Packages

We want your feedback!

cellTree Inference and visualisation of Single-Cell RNA-seq data as a hierarchical tree structure

compute.lda: LDA model inference In cellTree: Inference and visualisation of Single-Cell RNA-seq data as a hierarchical tree structure

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to compute.lda in cellTree...

R Package Documentation

Browse R Packages

We want your feedback!

cellTree
Inference and visualisation of Single-Cell RNA-seq data as a hierarchical tree structure

compute.lda: LDA model inference
In cellTree: Inference and visualisation of Single-Cell RNA-seq data as a hierarchical tree structure