scbfa: Perform BFA model on the expression profile

Description Usage Arguments Value Examples

Description

Perform BFA model on the expression profile

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
scBFA(
  scData,
  numFactors,
  X = NULL,
  Q = NULL,
  maxit = 300,
  method = "L-BFGS-B",
  initCellcoef = NULL,
  updateCellcoef = TRUE,
  updateGenecoef = TRUE,
  NUM_CELLS_PER_CHUNK = 5000,
  doChunking = FALSE
)

Arguments

scData

can be a raw count matrix, in which rows are genes and columns are cells; can be a seurat object; can be a SingleCellExperiment object.

numFactors

Numeric value, number of latent dimensions

X

N by C covariate matrix,e.g batch effect, in which rows are cells,columns are number of covariates.Default is NULL

Q

G by T gene-specific covariate matrix(e.g quality control measures), in which rows are genes columns are number of covariates, If no such covariates are available, then Q = NULL

maxit

Numeric value, parameter to control the Maximum number of iterations in the optimization, default is 300.

method

Method of optimization,default is L-BFGS-B(Limited memory BFGS) approach. Conjugate Gradient (CG) is recommended for larger dataset (number of cells > 10k)

initCellcoef

Initialization of C by G gene-specific coefficient matrix as user-defined coefficient β. Such user defined coefficient can be applied to address confounding batch effect

updateCellcoef

Logical value, parameter to decide whether to update C by G gene-specific coefficient matrix. Again, when the cell types are confounded with technical batches or there is no cell level covariate matrix, the user can keep the initialization of coefficients as known estimate.

updateGenecoef

Logical value, parameter to decide whether to update N by T gene-specific coefficient matrix. Again, when there is no gene level covariate matrix, this value should be FALSE by default.

NUM_CELLS_PER_CHUNK

scBFA can run out of memory on large datasets, so we can chunk up computations to avoid this if necessary. NUM_CELLS_PER_CHUNK is the number of cells per 'chunk' computed. Shrink if running out of mem.

doChunking

Use memory-efficient (but slower) chunking. Will do automatically if the chunk size is specified to be smaller than the # of cells in dataset.

Value

A model environment containing all parameter space of a BFA model as well as global variables needed for calculation:

A: G by K compressed feature space matrix

Z: N by K low dimensional embedding matrix

β: C by G cell level coefficient matrix

γ: N by T gene level coefficient matrix

V: G by 1 offset matrix

U: N by 1 offset matrix

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
## Working with Seurat or SingleCellExperiment object

library(Seurat)
library(SingleCellExperiment)


## Input expression profile, 5 genes x 3 cells

GeneExpr = matrix(rpois(15,1),nrow = 5,ncol = 3)
rownames(GeneExpr) = paste0("gene",seq_len(nrow(GeneExpr)))
colnames(GeneExpr) = paste0("cell",seq_len(ncol(GeneExpr)))
celltype = as.factor(sample(c(1,2,3),3,replace = TRUE))

## Create cell level technical batches

batch = sample(c("replicate 1","replicate 2","replicate 2"))
X = matrix(NA,nrow = length(batch),ncol = 1)
X[which(batch =="replicate 1"), ] = 0
X[which(batch =="replicate 2"), ] = 1
rownames(X) = colnames(GeneExpr)

## run BFA with raw count matrix

bfa_model = scBFA(scData = GeneExpr,X = scale(X),numFactors =2)

## Create Seurat object for input to BFA

scData = CreateSeuratObject(counts = GeneExpr,project="sc",min.cells = 0)

## Standardize the covariate matrix should be a default operation

bfa_model = scBFA(scData = scData, X = scale(X), numFactors = 2)

## Build the SingleCellExperiment object for input to BFA

## Set up SingleCellExperiment class

sce <- SingleCellExperiment(assay = list(counts = GeneExpr))

## Standardize the covariate matrix should be a default operation

bfa_model = scBFA(scData = sce, X = scale(X), numFactors = 2)

scBFA documentation built on Nov. 8, 2020, 5:24 p.m.