scone: Normalize Expression Data and Evaluate Normalization...
In YosefLab/scone: Single Cell Overview of Normalized Expression data

scone

R Documentation

Normalize Expression Data and Evaluate Normalization Performance

Description

This function applies and evaluates a variety of normalization schemes with respect to a specified SconeExperiment containing scRNA-Seq data. Each normalization consists of three main steps:

Impute: Replace observations of zeroes with expected expression values.
Scale: Match sample-specific expression scales or quantiles.
Adjust: Adjust for sample-level batch factors / unwanted variation.

Following completion of each step, the normalized expression matrix is scored based on SCONE's data-driven evaluation criteria.

Usage

scone(x, ...)

## S4 method for signature 'SconeExperiment'
scone(
  x,
  imputation = list(none = impute_null),
  impute_args = NULL,
  zero = c("none", "preadjust", "postadjust", "strong"),
  scaling,
  k_ruv = 5,
  k_qc = 5,
  adjust_bio = c("no", "yes", "force"),
  adjust_batch = c("no", "yes", "force"),
  run = TRUE,
  evaluate = TRUE,
  eval_pcs = 3,
  eval_proj = NULL,
  eval_proj_args = NULL,
  eval_kclust = 2:10,
  verbose = FALSE,
  stratified_pam = FALSE,
  stratified_cor = FALSE,
  stratified_rle = FALSE,
  return_norm = c("no", "in_memory", "hdf5"),
  hdf5file,
  bpparam = BiocParallel::bpparam()
)

Arguments

`x`	a `SconeExperiment` object.
`...`	see specific S4 methods for additional arguments.
`imputation`	list or function. (A list of) function(s) to be used for imputation. By default only scone::impute_null is included.
`impute_args`	arguments passed to all imputation functions.
`zero`	character. Zero-handling option, see Details.
`scaling`	list or function. (A list of) function(s) to be used for scaling normalization step.
`k_ruv`	numeric. The maximum number of factors of unwanted variation. Adjustment step models will include a range of 1 to k_ruv factors of unwanted variation. If 0, RUV adjustment will not be performed.
`k_qc`	numeric. The maximum number of quality metric PCs. Adjustment step models will include a range of 1 to k_qc quality metric PCs. If 0, QC factor adjustment will not be performed.
`adjust_bio`	character. If 'no', bio will not be included in Adjustment step models; if 'yes', both models with and without 'bio' will be run; if 'force', only models with 'bio' will be run.
`adjust_batch`	character. If 'no', batch will not be included in Adjustment step models; if 'yes', both models with and without 'batch' will be run; if 'force', only models with 'batch' will be run.
`run`	logical. If FALSE the normalization and evaluation are not run, but normalization parameters are returned in the output object for inspection by the user.
`evaluate`	logical. If FALSE the normalization methods will not be evaluated.
`eval_pcs`	numeric. The number of principal components to use for evaluation. Ignored if evaluate=FALSE.
`eval_proj`	function. Projection function for evaluation (see `score_matrix` for details). If NULL, PCA is used for projection.
`eval_proj_args`	list. List of arguments passed to projection function as eval_proj_args.
`eval_kclust`	numeric. The number of clusters (> 1) to be used for pam tightness evaluation. If an array of integers, largest average silhouette width (tightness) will be reported. If NULL, tightness will be returned NA.
`verbose`	logical. If TRUE some messagges are printed.
`stratified_pam`	logical. If TRUE then maximum ASW for PAM_SIL is separately computed for each biological-cross-batch stratum (accepting NAs), and a weighted average is returned as PAM_SIL.
`stratified_cor`	logical. If TRUE then cor metrics are separately computed for each biological-cross-batch stratum (accepts NAs), and weighted averages are returned for EXP_QC_COR, EXP_UV_COR, & EXP_WV_COR. Default FALSE.
`stratified_rle`	logical. If TRUE then rle metrics are separately computed for each biological-cross-batch stratum (accepts NAs), and weighted averages are returned for RLE_MED & RLE_IQR. Default FALSE.
`return_norm`	character. If "no" the normalized values will not be returned with the output object. This will create a much smaller object and may be useful for large datasets and/or when many combinations are compared. If "in_memory" the normalized values will be returned as part of the output. If "hdf5" they will be written on file using the `rhdf5` package.
`hdf5file`	character. If `return_norm="hdf5"`, the name of the file onto which to save the normalized matrices.
`bpparam`	object of class `bpparamClass` that specifies the back-end to be used for computations. See `bpparam` for details.

Details

If run=FALSE only the scone_params slot of the output object is populated with a data.frame, each row corresponding to a set of normalization parameters.

If x has a non-empty scone_params slot, only the subset of normalizations specified in scone_params are performed and evaluated.

The zero arguments supports 3 zero-handling options:

none: Default. No special zero-handling.
preadjust: Restore prior zero observations to zero following Impute and Scale steps.
postadjust: Set prior zero observations and all negative expression values to zero following the Adjust Step.
strong: Apply both preadjust and postadjust options.

Evaluation metrics are defined in score_matrix. Each metric is assigned a +/- signature for conversion to scores: Positive- signature metrics increase with improving performance, including BIO_SIL, PAM_SIL, and EXP_WV_COR. Negative-signature metrics decrease with improving performance, including BATCH_SIL, EXP_QC_COR, EXP_UV_COR, RLE_MED, and RLE_IQR. Scores are computed so that higer-performing methods are assigned higher scores.

Note that if one wants to include the unnormalized data in the final comparison of normalized matrices, the identity function must be included in the scaling list argument. Analogously, if one wants to include non-imputed data in the comparison, the scone::impute_null function must be included.

If return_norm="hdf5", the normalized matrices will be written to the hdf5file file. This must be a string specifying (a path to) a new file. If the file already exists, it will return error. In this case, the SconeExperiment object will not contain the normalized counts.

If return_norm="no" the normalized matrices are computed to copmute the scores and then discarded.

In all cases, the normalized matrices can be retrieved via the get_normalized function.

Value

A SconeExperiment object with the log-scaled normalized data matrix as elements of the assays slot, if return_norm is "in_memory", and with the performance metrics and scores.

Examples


mat <- matrix(rpois(1000, lambda = 5), ncol=10)
colnames(mat) <- paste("X", 1:ncol(mat), sep="")
obj <- SconeExperiment(mat)
no_results <- scone(obj, scaling=list(none=identity,
           uq=UQ_FN, deseq=DESEQ_FN),
           run=FALSE, k_ruv=0, k_qc=0, eval_kclust=2)
           
results <- scone(obj, scaling=list(none=identity,
           uq=UQ_FN, deseq=DESEQ_FN),
           run=TRUE, k_ruv=0, k_qc=0, eval_kclust=2,
           bpparam = BiocParallel::SerialParam())
           
results_in_memory <- scone(obj, scaling=list(none=identity,
           uq=UQ_FN, deseq=DESEQ_FN),
           k_ruv=0, k_qc=0, eval_kclust=2,
           return_norm = "in_memory",
           bpparam = BiocParallel::SerialParam())

YosefLab/scone documentation built on Feb. 19, 2025, 4:22 p.m.