LocASN: Single-cell RNA sequencing normalization using a local...

View source: R/LocASN.R

LocASNR Documentation

Single-cell RNA sequencing normalization using a local average technique

Description

A function of normalizing single cell RNA-seq gene expression.

Usage

LocASN(
  countmatrix,
  conditions = NULL,
  filter = FALSE,
  gene_num_gezero = 3,
  cell_num_gezero = 10,
  numGeneforEst = 2000,
  divideforFast = TRUE,
  numDivide = NULL,
  bw.method = "SJ",
  cutoff = 2
)

Arguments

countmatrix

Input. Unnormalized count matrix (genes by cells).

conditions

Input (Optional). Condition/sample number of each cell. The default = NULL, denoting all cells are from the same condition/sample.

filter

Input (Optional). A logic value to indicate if need data filtering. If TRUE, please see the details of gene_num_gezero and cell_num_gezero for input. The default value is FALSE.

gene_num_gezero

Input (Optional). A threshold (integer) to determine the inclusion of a gene. The gene included needs to be expressed in at least gene_num_gezero cells. The default value is 3.

cell_num_gezero

Input (Optional). A threshold (integer) to determine the inclusion of a cell. The cell included needs to contain at least cell_num_gezero expressed genes. The default value is 10.

numGeneforEst

Input (Optional). Use top numGeneforEst (integer) genes according to the proportions of gene counts > 0 in cells to estimate the scaling factors, for speeding up computation.

divideforFast

Input (Optional). A logic value to indicate if speeding up computation by randomly dividing cells in each condition into numDivide smaller groups. Please input an integer in numDivide below if divideforFast = TRUE. The default value is TRUE.

numDivide

Input (Optional). An integer is required if divideforFast = TRUE. The cells in each condition will be randomly divided by numDivide small groups. The default numDivide = NULL will automatically use the maximum of 1 and the smallest integer that is not less than the number of cells in each condition divided by 3K, that means no division for conditions with less than 3K cells.

bw.method

Input (Optional). A method to estimate the bandwidths in Kernel weighting. The default method uses "SJ" (SJ bandwidth, Sheather and Jones, 1991). Otherwise, uses "RoT" (rule-of-thumb, Silverman, 1986).

cutoff

Input (Optional). To be more computationally efficient, low weights will be set to zeros when cell distances are larger than cutoff times bandwidths. The default value = 2.

Value

NormalizedData

Matrix (genes by cells). Data matrix after normalization.

scalingFactor

Vector. Cell-specific scaling factors.

delete_genes

Vector. Indeice of the genes deleted.

delete_cells

Vector. Indeice of the cells deleted.

Examples

set.seed(12345)
G <- 2000; n <- 600 # G: number of genes, n: number of cells
mu <- rgamma(G, shape = 2, rate = 2)
NB_cell <- function(j) rnbinom(G, size = 0.1, mu = mu)
countsimdata <- sapply(1:n, NB_cell)
colnames(countsimdata) <- paste("c", 1:n, sep = "_")
rownames(countsimdata) <- paste("g", 1:G, sep = "_")
Result <- LocASN(countmatrix = as(countsimdata,"sparseMatrix"))
Result$NormalizedData[1:10,1:10]; Result$scalingFactor[1:10]
#conditions <- c(rep(1,n/2), rep(2,n/2))
#Result2 <- LocASN(countmatrix = countsimdata, conditions = conditions)
#Result2$NormalizedData[1:10,1:10]; Result2$scalingFactor[1:10]

cyhsuTN/scKWARN documentation built on Feb. 11, 2024, 2:21 p.m.