runLSA: Latent Semantic Analysis

View source: R/runLSA.R

runLSAR Documentation

Latent Semantic Analysis

Description

This function takes a Snap obj as input with bmat/pmat/gmat slot and run Latent Semantic Analysis (LSA).

Usage

runLSA(obj, input.mat = c("bmat", "pmat"), pc.num = 50, logTF = FALSE,
  scale.factor = 1e+05, min.cell = 10, seed.use = 10)

Arguments

obj

A snap obj

input.mat

Input matrix to be used for LSA c("bmat", "pmat").

pc.num

An integer number of dimetions to return [50].

logTF

A logical variable indicates wehther to log-scale term frequency [TRUE].

scale.factor

A numeric variable used to scale the logTF [100000].

seed.use

A numeric class that indicates random seeding number [10].

Details

Below instruction is modified from 10X cell-ranger website The a cell-by-bin (bmat) or cell-by-peak (pmat) matrix is first normalized via the inverse-document frequency (idf) transform where each peak/bin count is scaled by the log of the ratio of the number of barcodes in the matrix and the number of barcodes where the peak has a non-zero count. This provides greater weight to counts in peaks that occur in fewer barcodes. Singular value decomposition (SVD) is performed on this normalized matrix using IRLBA without scaling or centering, to produce the transformed matrix in lower dimensional space, as well as the components and the singular values signifying the importance of each component.

LSA has four major steps: 1) term frequency - TF = t(t(X) / Matrix::colSums(X)); When logTF is TRUE, TF is also log scaled. 2) inverse document frequency - IDF = log(1 + ncol(X) / rowSums(X)) 3) TF-IDF - TF * IDF 4) SVD - Run singular value decomposition

Examples

data(demo.sp);
demo.sp = makeBinary(demo.sp);
demo.sp = runLSA(obj=demo.sp, input.mat="bmat", pc.num=50, logTF=TRUE, min.cell=0);


r3fang/SnapATAC documentation built on March 29, 2022, 4:33 p.m.