normalizeWithinReplicates: Normalize within replicates
In pkimes/upbm: Tools for the analysis of universal protein binding microarrays

Description Usage Arguments Details Value Author(s) See Also

View source: R/normalizeWithinReplicates.R

Universal PBM experiments are often performed with several conditions of interest, e.g. allelic variants, assayed on separate arrays of the same plate with few replicates. Within and across plates, probe intensities can vary for biologically uninteresting reasons, such as concentration differences. To explicitly correct for these differences, normalization is performed in two steps.

First, normalization is performed within replicates (plates) with the assumption that biologically uninteresting differences only affect probe intensities multiplicatively. Normalization factors are estimated for each sample relative to a baseline condition on each plate. The baseline should ideally be a replicate wild type or other natural reference condition included in each replicate (plate). This function includes approaches for performing this step of normalization.

Second, normalization is performed across replicates (plates). More detail on this procedure can be found in the normalizeAcrossReplicates documentation.

The approaches to normalization implemented in this function make a fundamental assumption that lower-tail probe intensities are distributed similarly across the conditions being normalized. This assumption is generally satisfied for allelic variants of the same transcription factor or transcription factors with similar binding affinities. However, this assumption may not always hold, e.g. if comparing proteins of completely different families. In these cases, normalization should be performed with caution, and analyses and plots comparing the distributions of lower-tail probe intensities should be explored.

normalizeWithinReplicates(
  pe,
  assay = SummarizedExperiment::assayNames(pe)[1],
  method = c("tmm", "quantile"),
  q = 0.6,
  qlower = 0,
  qdiff = 0.2,
  group = "id",
  stratify = "condition",
  baseline = NULL,
  verbose = FALSE
)

`pe`	a SummarizedExperiment object containing GPR intensity information.
`assay`	a string name of the assay to normalize. (default = `SummarizedExperiment::assayNames(pe)[1]`)
`method`	a string specifying the method to use for normalization. Must be one of `"tmm"` or `"quantile"`. Details on the methods are provided below. (default = `"tmm"`)
`q`	a percentile between 0 and 1 specifying either the upper quantile of probes to include for normalization when `method = "tmm"` or the quantile to use for aligning samples when `method = "quantile"`. (default = 0.6)
`qlower`	a percentile between 0 and 1-`q` specifying the lower quantile of probes to include for normalization when `method = "tmm"`. This parameter is ignored when `method = "quantile"`. (defalut = 0)
`qdiff`	a percentile between 0 and 0.5 specifying the additional fraction of lower-tail probes to filter based on the deviation from the baseline condition when `method = "tmm"`. Probes with the `qdiff` smallest (most negative) and the `qdiff` largest (most positive) deviations from baseline condition will be filtered from normalization. This parameter is ignored when `method = "quantile"`. (default = 0.2)
`group`	a character string specifying a column in `colData(pe)` to use for grouping replicates. If scans shouldn't be grouped, specify NULL. (default = `"id"`)
`stratify`	a character string specifying a column in `colData(pe)` to use for determining the unique baseline scan within each `group`. (default = `"condition"`)
`baseline`	a character string specifying the baseline condition in the `stratify` column to normalize other conditions against within each `group`. If not specified and set to NULL, the baseline value is guessed by looking for values in the `stratify` column ending in “ref". If multiple unique matching values are found, one value is chosen arbitrarily. If the baseline condition is missing from any `group` with more than one scan, an error is thrown. (default = NULL)
`verbose`	a logical value whether to print verbose output during analysis. (default = FALSE)

The trimmed mean of M-values ("tmm") method implemented in this function for cross-sample normalization within replicates is based on the popular TMM method for RNA-seq data included in the edgeR package. Very simply, a normalization factor is estimated as the trimmed mean of probe-level log-scale differences between the baseline condition and sample using the lower [qlower, q] percentile probes. Probes are ordered by the log-scale average intensity across the baseline condition and sample. The trimmed mean is calculated excluding the top and bottom qdiff probes.

Unlike RNA-seq expression estimates, PBM data show near-constant variance in log-scale differences as a function of the log-scale mean intensities. Therefore, a simplified variant of the original TMM method is used, where precision weights are not introduced.

The quantile-based ("quantile") method should not be confused with what is commonly referred to as “quantile normalization." Here, quantile-based normalization computes scaling factors across

Original PBMExperiment object with assay containing within-replicate normalized intensities ("normalized") and a new column added to the colData, "withinRepScale", containing the inverse of the scaling factors used to normalize intensities. If an assay with the same name is already included in the object, it will be overwritten.

Dongyuan Song, Patrick Kimes

normalizeAcrossReplicates

pkimes/upbm documentation built on Oct. 17, 2020, 9:10 a.m.