runLDA: Latent Dirichlet Allocation

View source: R/runLDA.R

runLDAR Documentation

Latent Dirichlet Allocation

Description

This function takes a Snap obj as input with bmat/pmat slot and run Latent Dirichlet Allocation (LDA).

Usage

runLDA(obj, input.mat = c("bmat", "pmat"), topic = c(10, 20, 30),
  method = c("Z-score", "Probability"), num.cores = 1, min.cell = 10,
  seed.use = 10, iterations = 500, burnin = 250, alpha = 50,
  alphaByTopic = TRUE, beta = 0.1)

Arguments

obj

A snap obj

input.mat

Input matrix to be used for LSA c("bmat", "pmat").

topic

An integer number of topics to return c(10, 20, 30).

method

Method used to normalize the cell-by-topic score c("Z-score", "Probability").

num.cores

Number of cores used for computing [1].

min.cell

Min cell coverage. Features with coverage less than min.cell will be filtered [10].

seed.use

A numeric class that indicates random seeding number [10].

iterations

The number of sweeps of Gibbs sampling over the entire corpus to make.

burnin

A scalar integer indicating the number of Gibbs sweeps to consider as burn-in (i.e., throw away) for 'lda.collapsed.gibbs.sampler' and 'mmsb.collapsed.gibbs.sampler'. If this parameter is non-NULL, it will also have the side-effect of enabling the document_expects field of the return value (see below for details). Note that burnin iterations do NOT count towards num.iterations.

alpha

The scalar value of the Dirichlet hyperparameter for topic proportions.

alphaByTopic

scale alpha by topic number

beta

The scalar value of the Dirichlet hyperparamater for topic multinomials.

Details

LDA is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. was first applied to analyze single cell ATAC-seq by Cis-Topic (González-Blas, Nature Methods, 2019). LDA iteratively optimize two probability distributions: (1) the probability of a region belonging to a topic (region–topic distribution) and (2) the contribution of a topic within a cell (topic–cell distribution).

Multiple LDA models will be trained and the optimal one will be selected according to the likelohood.

Examples

data(demo.sp);
demo.sp = makeBinary(demo.sp);
demo.sp = runLDA(
obj=demo.sp, 
input.mat="bmat", 
topic=c(10, 20, 30), 
method="Z-score", 
min.cell=0,
num.cores=3
);


r3fang/SnapATAC documentation built on March 29, 2022, 4:33 p.m.