scone: Normalize Expression Data and Evaluate Normalization...

sconeR Documentation

Normalize Expression Data and Evaluate Normalization Performance

Description

This function applies and evaluates a variety of normalization schemes with respect to a specified SconeExperiment containing scRNA-Seq data. Each normalization consists of three main steps:

  • Impute: Replace observations of zeroes with expected expression values.

  • Scale: Match sample-specific expression scales or quantiles.

  • Adjust: Adjust for sample-level batch factors / unwanted variation.

Following completion of each step, the normalized expression matrix is scored based on SCONE's data-driven evaluation criteria.

Usage

scone(x, ...)

## S4 method for signature 'SconeExperiment'
scone(
  x,
  imputation = list(none = impute_null),
  impute_args = NULL,
  zero = c("none", "preadjust", "postadjust", "strong"),
  scaling,
  k_ruv = 5,
  k_qc = 5,
  adjust_bio = c("no", "yes", "force"),
  adjust_batch = c("no", "yes", "force"),
  run = TRUE,
  evaluate = TRUE,
  eval_pcs = 3,
  eval_proj = NULL,
  eval_proj_args = NULL,
  eval_kclust = 2:10,
  verbose = FALSE,
  stratified_pam = FALSE,
  stratified_cor = FALSE,
  stratified_rle = FALSE,
  return_norm = c("no", "in_memory", "hdf5"),
  hdf5file,
  bpparam = BiocParallel::bpparam()
)

Arguments

x

a SconeExperiment object.

...

see specific S4 methods for additional arguments.

imputation

list or function. (A list of) function(s) to be used for imputation. By default only scone::impute_null is included.

impute_args

arguments passed to all imputation functions.

zero

character. Zero-handling option, see Details.

scaling

list or function. (A list of) function(s) to be used for scaling normalization step.

k_ruv

numeric. The maximum number of factors of unwanted variation. Adjustment step models will include a range of 1 to k_ruv factors of unwanted variation. If 0, RUV adjustment will not be performed.

k_qc

numeric. The maximum number of quality metric PCs. Adjustment step models will include a range of 1 to k_qc quality metric PCs. If 0, QC factor adjustment will not be performed.

adjust_bio

character. If 'no', bio will not be included in Adjustment step models; if 'yes', both models with and without 'bio' will be run; if 'force', only models with 'bio' will be run.

adjust_batch

character. If 'no', batch will not be included in Adjustment step models; if 'yes', both models with and without 'batch' will be run; if 'force', only models with 'batch' will be run.

run

logical. If FALSE the normalization and evaluation are not run, but normalization parameters are returned in the output object for inspection by the user.

evaluate

logical. If FALSE the normalization methods will not be evaluated.

eval_pcs

numeric. The number of principal components to use for evaluation. Ignored if evaluate=FALSE.

eval_proj

function. Projection function for evaluation (see score_matrix for details). If NULL, PCA is used for projection.

eval_proj_args

list. List of arguments passed to projection function as eval_proj_args.

eval_kclust

numeric. The number of clusters (> 1) to be used for pam tightness evaluation. If an array of integers, largest average silhouette width (tightness) will be reported. If NULL, tightness will be returned NA.

verbose

logical. If TRUE some messagges are printed.

stratified_pam

logical. If TRUE then maximum ASW for PAM_SIL is separately computed for each biological-cross-batch stratum (accepting NAs), and a weighted average is returned as PAM_SIL.

stratified_cor

logical. If TRUE then cor metrics are separately computed for each biological-cross-batch stratum (accepts NAs), and weighted averages are returned for EXP_QC_COR, EXP_UV_COR, & EXP_WV_COR. Default FALSE.

stratified_rle

logical. If TRUE then rle metrics are separately computed for each biological-cross-batch stratum (accepts NAs), and weighted averages are returned for RLE_MED & RLE_IQR. Default FALSE.

return_norm

character. If "no" the normalized values will not be returned with the output object. This will create a much smaller object and may be useful for large datasets and/or when many combinations are compared. If "in_memory" the normalized values will be returned as part of the output. If "hdf5" they will be written on file using the rhdf5 package.

hdf5file

character. If return_norm="hdf5", the name of the file onto which to save the normalized matrices.

bpparam

object of class bpparamClass that specifies the back-end to be used for computations. See bpparam for details.

Details

If run=FALSE only the scone_params slot of the output object is populated with a data.frame, each row corresponding to a set of normalization parameters.

If x has a non-empty scone_params slot, only the subset of normalizations specified in scone_params are performed and evaluated.

The zero arguments supports 3 zero-handling options:

  • none: Default. No special zero-handling.

  • preadjust: Restore prior zero observations to zero following Impute and Scale steps.

  • postadjust: Set prior zero observations and all negative expression values to zero following the Adjust Step.

  • strong: Apply both preadjust and postadjust options.

Evaluation metrics are defined in score_matrix. Each metric is assigned a +/- signature for conversion to scores: Positive- signature metrics increase with improving performance, including BIO_SIL, PAM_SIL, and EXP_WV_COR. Negative-signature metrics decrease with improving performance, including BATCH_SIL, EXP_QC_COR, EXP_UV_COR, RLE_MED, and RLE_IQR. Scores are computed so that higer-performing methods are assigned higher scores.

Note that if one wants to include the unnormalized data in the final comparison of normalized matrices, the identity function must be included in the scaling list argument. Analogously, if one wants to include non-imputed data in the comparison, the scone::impute_null function must be included.

If return_norm="hdf5", the normalized matrices will be written to the hdf5file file. This must be a string specifying (a path to) a new file. If the file already exists, it will return error. In this case, the SconeExperiment object will not contain the normalized counts.

If return_norm="no" the normalized matrices are computed to copmute the scores and then discarded.

In all cases, the normalized matrices can be retrieved via the get_normalized function.

Value

A SconeExperiment object with the log-scaled normalized data matrix as elements of the assays slot, if return_norm is "in_memory", and with the performance metrics and scores.

See Also

get_normalized, get_design

Examples


mat <- matrix(rpois(1000, lambda = 5), ncol=10)
colnames(mat) <- paste("X", 1:ncol(mat), sep="")
obj <- SconeExperiment(mat)
no_results <- scone(obj, scaling=list(none=identity,
           uq=UQ_FN, deseq=DESEQ_FN),
           run=FALSE, k_ruv=0, k_qc=0, eval_kclust=2)
           
results <- scone(obj, scaling=list(none=identity,
           uq=UQ_FN, deseq=DESEQ_FN),
           run=TRUE, k_ruv=0, k_qc=0, eval_kclust=2,
           bpparam = BiocParallel::SerialParam())
           
results_in_memory <- scone(obj, scaling=list(none=identity,
           uq=UQ_FN, deseq=DESEQ_FN),
           k_ruv=0, k_qc=0, eval_kclust=2,
           return_norm = "in_memory",
           bpparam = BiocParallel::SerialParam())


YosefLab/scone documentation built on Oct. 21, 2024, 4:39 p.m.