bayNorm: A wrapper function of prior estimation and bayNorm function
In WT215/bayNorm: Single-cell RNA sequencing data normalization

bayNorm

R Documentation

A wrapper function of prior estimation and bayNorm function

Description

This is the main wrapper function for bayNorm. The input is a matrix of raw scRNA-seq data and a vector of capture efficiencies of cells. You can also specify the condition of cells for normalizing multiple groups of cells separately.

Usage

bayNorm(
  Data,
  BETA_vec = NULL,
  Conditions = NULL,
  UMI_sffl = NULL,
  Prior_type = NULL,
  mode_version = FALSE,
  mean_version = FALSE,
  S = 20,
  parallel = TRUE,
  NCores = 5,
  FIX_MU = TRUE,
  GR = FALSE,
  BB_SIZE = TRUE,
  verbose = TRUE,
  out.sparse = FALSE
)

Arguments

`Data`	A matrix of single-cell expression where rows are genes and columns are samples (cells). `Data` can be of class `SummarizedExperiment` (the assays slot contains the expression matrix, is named "Counts"), just `matrix` or sparse matrix.
`BETA_vec`	A vector of capture efficiencies (probabilities) of cells. If it is null, library size (total count) normalized to 0.06 will be used as the input `BETA_vec`. `BETA_vec` less and equal to 0 or greater and equal to 1 will be replaced by the minimum and maximum of the `BETA_vec` which range between (0,1) respectively.
`Conditions`	vector of condition labels, this should correspond to the columns of the Data. D efault is NULL, which assumes that all cells belong to the same group.
`UMI_sffl`	Scaling factors are required only for non-UMI based data for which `Data` is devided by `UMI_sffl`. If non-null and `Conditions` is non-null, then UMI_sffl should be a vector of length equal to the number of groups. Default is `NULL`.
`Prior_type`	Determines what groups of cells is used in estimating prior using `Conditions`. Default is `NULL`. If `Conditions` is `NULL`, priors are estimated based on all cells. If `Conditions` is not `NULL` and if `Prior_type` is LL, priors are estimated within each group respectively. If `Prior_type` is GG, priors are estimated based on cells from all groups. LL is suitable for DE detection. GG is preferred if reduction of batch effect between samples are desired for example for technical replicates (see bayNorm paper).
`mode_version`	If TRUE, bayNorm return modes of posterior estimates as normalized data which is a 2D matrix rather than samples from posterior which is a 3D array. Default is FALSE.
`mean_version`	If TRUE, bayNorm return means of posterior estimates as normalized data, which is a 2D matrix rather than samples from posterior which is a 3D array. Default is FALSE.
`S`	The number of samples you would like to generate from estimated posterior distribution (The third dimension of 3D array). Default is 20. S needs to be specified if `mode_version`=FALSE.
`parallel`	If TRUE, `NCores` cores will be used for parallelization. Default is TRUE.
`NCores`	number of cores to use, default is 5. This will be used to set up a parallel environment using either MulticoreParam (Linux, Mac) or SnowParam (Windows) with NCores using the package BiocParallel.
`FIX_MU`	Whether fix mu (the mean parameter of prior distribution) to its MME estimate, when estimating prior parameters by maximizing marginal distribution. If TRUE, then 1D optimization is used, otherwise 2D optimization for both mu and size is used (slow). Default is TRUE.
`GR`	If TRUE, the gradient function will be used in optimization. However since the gradient function itself is very complicated, it does not help too much in speeding up. Default is FALSE.
`BB_SIZE`	If TRUE, estimate size parameter of prior using maximization of marginal likelihood, and then use it for adjusting MME estimate of SIZE Default is TRUE.
`verbose`	print out status messages. Default is TRUE.
`out.sparse`	Only valid for mean version: Whether the output is of type dgCMatrix or not. Default is FALSE.

Details

A wrapper function of prior estimation and bayNorm function.

Value

List containing 3D arrays of normalized expression (if mode_version=FALSE) or 2D matrix of normalized expression (if mode_version=TRUE or mean_version=TRUE), a list contains estimated priors and a list contains input parameters used: BETA_vec, Conditions (if specified), UMI_sffl (if specified), Prior_type, FIX_MU, BB_SIZE and GR.

References

Wenhao Tang, Francois Bertaux, Philipp Thomas, Claire Stefanelli, Malika Saint, Samuel Blaise Marguerat, Vahid Shahrezaei bayNorm: Bayesian gene expression recovery, imputation and normalisation for single cell RNA-sequencing data Bioinformatics, btz726; doi: 10.1093/bioinformatics/btz726

Examples

data('EXAMPLE_DATA_list')
#Return 3D array normalzied data:
bayNorm_3D<-bayNorm(
Data=EXAMPLE_DATA_list$inputdata[,seq(1,30)],
BETA_vec = EXAMPLE_DATA_list$inputbeta[seq(1,30)],
mode_version=FALSE,parallel =FALSE)

WT215/bayNorm documentation built on Sept. 2, 2022, 1:46 a.m.