normalize_vwmb: Normalize a numerical matrix by the Variation Within, Mode...
In ftwkoopmans/msdap: Mass Spectrometry Downstream Analysis Pipeline

normalize_vwmb

R Documentation

Normalize a numerical matrix by the Variation Within, Mode Between (VWMB) algorithm

Description

The normalization algorithm consists of two consecutive steps:

samples are scaled within each group to minimize the overall metric_within among replicates
summarize all samples per group by respective row mean values (from row*sample to a row*group matrix). Then rescale at the sample-group-level to minimize the overall metric_between

The metric "var" will minimize the median of all variation-per-row. The metric "mode" will minimize the modes of log-foldchange distribution between pairwise samples or groups (when set as metric_within and metric_between respectfully).

Default setting is Variation Within, Mode Between (VWMB); metric_within="var" and metric_between="mode" will normalize such that median of variation-per-row is minimized per sample group, and subsequentially the log-foldchange distributions between all pairwise groups are minimized.

Alternatively, one can also apply mode normalization within-group (Mode Within, Mode Between. MWMB) by setting metric_within="mode". If the dataset has (unknown) covariates and a sufficient number of replicates, this might be beneficial because covariate-specific effects are not averaged out as they might be with metric_within="var".

To ignore groups and simply apply mode-between to all samples (not recommended!); normalize_vwmb(x, groups=NA, metric_within="mode").

To ignore groups and simply normalize all samples by reducing overall variation (not recommended!); normalize_vwmb(x, groups=NA, metric_within="var").

note; if you want to treat replicate samples that are flagged as 'exclude' upstream differently while still including them in the data matrix, you could set parameter groups=paste(samples$group, samples$exclude) to put them in separate groups.

Usage

normalize_vwmb(
  x,
  groups = NA,
  metric_within = "var",
  metric_between = "mode",
  include_attributes = FALSE
)

Arguments

`x`	numerical data matrix to normalize, should be log transformed
`groups`	array describing the grouping of the columns in x (sample groups). Or alternatively set to NA to indicate there are no groups
`metric_within`	how should replicate samples within a group be normalized? valid arguments: "var" reduce overall variation (default). "mode" reduce overall foldchange mode. pass empty string to disable
`metric_between`	analogous to the metric_within parameter, how to normalize between groups? allowed parameters are "var" and "mode" (default). To disable, set groups=NA
`include_attributes`	optionally, return some additional metrics as attributes of x. The "scaling" attribute describes the increase/decrease of each sample

Value

normalized matrix x

Examples

## Not run: 
# Define a custom normalization function that we'll use in the MS-DAP pipeline later.
# All it does is apply the first part of the VWMB algorithm to reduce varation between replicates.
norm_within_only = function(x_as_log2, group_by_cols, ...) {
  normalize_vwmb(x_as_log2, groups = group_by_cols,
                 metric_within="var", metric_between = "")
}

# now we use the main MS-DAP function as typical, but note that we here set the
# normalization algorithm to the function we defined in the lines above.
dataset = analysis_quickstart(
  dataset,
  norm_algorithm = "norm_within_only",
  # <other parameters here>
)

## End(Not run)

ftwkoopmans/msdap documentation built on March 5, 2025, 12:15 a.m.