scPLS: A function to remove unwanted variation from target genes...

Description Usage Arguments

View source: R/scPLS.R

Description

To infer latent confounding factors from scRNAseq studies and remove unwanted variation, we develop a novel statistical method, which we refer to as scPLS. scPLS is based on the partial least squares regression models and incorporates both control and target genes to infer hidden confounding effects. In addition, our method can model other systematic biological variation and heterogeneity, which are often observed in the target genes. By incorporating such systematic heterogeneity, we can further improve the estimation of the confounding factors and the removal of unwanted variation. To make our method widely applicable, we also develop a novel efficient estimation algorithm that is scalable to hundreds of cells and thousands of genes.

Usage

1
2
3
4
5
6
scPLS(Target, Control, k1, k2, iter = 500, alpha = 10, kappa = 2,
  diagH = 1, g = 1, h1 = 1, h2 = 1, c = 1, d = 1, limit = 25,
  method = c("EM", "EMSparse", "EMSparseTraining", "EMSparseNfold", "IBP",
  "PCA"), penalty = seq(0.01, 10, length.out = 20), tol = 0.1,
  givenpenalty = NULL, kfold = NULL, Chunk = TRUE, chunk.size = 1000,
  center = TRUE)

Arguments

Target:

A $n$ by $q$ matrix for $q$ target genes from $n$ samples.

Control:

A $n$ by $p$ matrix for $p$ control genes from $n$ samples. To remove cell cycle effect, input the matrix of cell cycle gene expression. To remove technical effect, input the matrix of control gene expression.

k1:

The number of technical factors.

k2:

The number of structured biological factors.

iter:

The number of iterations for the EM algorithm. The default is 500.

alpha:

Hyperparameter if using the IBP prior, other hyperparameters including kappa, diagH, g, h1, h2, c, d.

limit:

the maximal number of factors if using the IBP prior.

method:

The method used to infer the latent factors. The default is “EM" algorithm. Other choices include “EMSparse" algorithm: penalty on sparsity of the factor matrix is specified. “EMSparseTraining" algorithm: penalty on sparsity is learned from training samples. “EMparseNfold" algorithm: penalty on sparsity is learned from N fold cross-validation. “IBP" algorithm: the sparse factor matrix modeled by an IBP prior. “PCA" algorithm: The initializer for the EM algorithm. The latent factors are estimated from a Singular Value Decomposition.

penalty:

A sequence of penality will be tested in training or cross-validation.

tol:

Tolerance for the convergence.

givenpenalty:

Specified penalty level on sparsity of the factor matrix. The default is NULL.

kfold:

The fold number for cross-validation.

Chunk:

Whether to use EM-in-chunks algorithm. The default is TRUE.

chunk.size:

The chunk size (number of genes) for EM-in-chunks algorithm. The default is 1000.


ChenMengjie/Citrus documentation built on April 14, 2020, 4:55 a.m.