se_normalize: Normalize SummarizedExperiment data
In jmw86069/jamses: Jam SummarizedExperiment Stats

se_normalize

R Documentation

Normalize SummarizedExperiment data

Description

Normalize SummarizedExperiment data

Usage

se_normalize(
  se,
  method = c("quantile", "jammanorm", "limma_batch_adjust", "TMM", "TMMwsp", "RLE"),
  assay_names = NULL,
  output_method_prefix = NULL,
  output_assay_names = NULL,
  genes = NULL,
  samples = NULL,
  params = list(quantile = list(ties = TRUE), jammanorm = list(controlGenes = NULL,
    minimum_mean = 0, controlSamples = NULL, centerGroups = NULL, useMedian = FALSE,
    noise_floor = NULL, noise_floor_value = NULL), limma_batch_adjust = list(batch =
    NULL, group = NULL), TMM = list(refColumn = NULL, logratioTrim = 0.3, sumTrim = 0.05,
    doWeighting = TRUE, Acutoff = NULL), TMMwsp = list(refColumn = NULL, logratioTrim =
    0.3, sumTrim = 0.05, doWeighting = TRUE, Acutoff = NULL), RLE = list(refColumn =
    NULL, logratioTrim = 0.3, 
     sumTrim = 0.05, doWeighting = TRUE, Acutoff = NULL)),
  normgroup = NULL,
  floor = 0,
  enforce_norm_floor = TRUE,
  output_sep = "_",
  override = TRUE,
  populate_mcols = TRUE,
  verbose = FALSE,
  ...
)

Arguments

`se`	`SummarizedExperiment` object
`method`	`character` vector indicating which normalization method(s) to apply. `"quantile"`: quantile normalization via `limma::normalizeQuantiles()` `"jammanorm"`: log-ratio normalization via `jamma::jammanorm()` `"limma_batch_adjust"`: batch adjustment via `limma::removeBatchEffect()`, recommended for data visualization, but not recommended for downstream statistical comparisons. `"TMM"`: trimmed mean of M-values via `edgeR::calcNormFactors()` `"TMMwsp"`: TMM with singleton pairing via `edgeR::calcNormFactors()` `"RLE"`: relative log expression via `edgeR::calcNormFactors()`
`assay_names`	`character` vector or one or more `names(assays(se))` that indicates which numeric matrix to use during normalization. When multiple values are provided, each matrix is normalized independently by each `method`.
`output_method_prefix`	`character` vector (optional) with custom method prefix values to use when creating the new `assay_name` for each normalization. It must have length equal to `length(method)`, to be applied to each method in order. Note that `output_assay_names` takes priority, and when it is defined the `output_method_prefix` entries are ignored. Consider these arguments: assay_name="counts", method="limma_batch_adjust", output_method_prefix="lba" The assay_name created during normalization will be `"lba_counts"`.
`output_assay_names`	`character` vector (optional) which overrides the default method for defining assay names for normalized data. This vector length must equal `length(method) * length(assay_names)`, and will be applied in the order data is normalized: `assay_names` are iterated. For each value in `assay_names`, each normalization in `method` is applied. Therefore the order of `output_assay_names` could follow this order: `method1_assay1`, `method1_assay2`, `method2_assay1`, `method2_assay2`.
`genes`	`character` vector (optional) used to define a subset of gene rows in `se` to use for normalization. Values must match `rownames(se)`.
`samples`	`character` vector (optional) used to define a subset of sample columns in `se` to use for normalization. Values must match `colnames(se)`.
`params`	`list` (optional) parameters specific to each normalization method, passed to `matrix_normalize()`. Any value which is not defined in the `params` provided will use the default value in `matrix_normalize()`, for example `params=list(jammanorm=list(minimum_mean=2))` will use `minimum_mean=2` then use other default values relevant to the `jammanorm` normalization method.
`normgroup`	`character` or equivalent vector that defines subgroups of `samples` to be normalized indendently of each normgroup. When `NULL` then all data is normalized together as default. The `normgroup` vector is expected to be in the same order as `samples`, or `names(normgroup)` must contain all `samples`.
`output_sep`	`character` string used as a delimited between the `method` and the `assay_names` to define the output assay name, for example when `assay_name="counts"`, `method="quantile"`, and `output_sep="_"` the new assay name will be `"quantile_counts"`.
`override`	`logical` indicating whether to override any pre-existing matrix values with the same output assay name. When `override=FALSE` and the output assay name already exists, the normalization will not be performed.
`populate_mcols`	`logical` indicating whether to populate normalization details into `mcols(assays(se))`, including the normalization `method`, the source `assay_name` used during normalization, and values from `params`.
`verbose`	`logical` indicating whether to print verbose output.
`...`	additional arguments are passed to `matrix_normalize()`.

Details

This function applies one or more data normalization methods to an input SummarizedExperiment object. The normalization is applied to one or more matrix data stored in assays(se), each one is run independently.

Note that supplying genes and samples will apply normalization to only those genes and samples, and this data will be stored in the full SummarizedExperiment object se with NA values used to fill any values not present in genes or samples.

For example if assay_names contains two assay names, and method contains two methods, the output will include four normalizations, where each assay name is normalized two ways. The output assay names will be something like "assay1_method1", "assay1_method2", "assay2_method1", "assay2_method2". It is not always necessary to normalize data by multiple different methods, however when two methods are similar and need to be compared, the SummarizedExperiment object is a convenient place to store different normalization results for downstream comparison. Further, the method se_contrast_stats() is able to apply equivalent statistical contrasts to each normalization, and returns an array of statistical hits which is convenient for direct comparison of results.

This method calls matrix_normalize() to perform each normalization step, see that function description for details on each method.

Value

SummarizedExperiment object where the normalized output is added to assays(se) using the naming format method_assayname.

Examples

if (jamba::check_pkg_installed("farrisdata")) {

   # se_normalize
   # suppressPackageStartupMessages(library(SummarizedExperiment))
   GeneSE <- farrisdata::farrisGeneSE;
   samples <- colnames(GeneSE);
   genes <- rownames(GeneSE);

   GeneSE <- se_normalize(GeneSE,
      genes=genes,
      samples=samples,
      assay_names=c("raw_counts", "counts"),
      method="jammanorm",
      params=list(jammanorm=list(minimum_mean=5)))
   SummarizedExperiment::mcols(SummarizedExperiment::assays(GeneSE))
   names(SummarizedExperiment::assays(GeneSE))

   # review normalization factor values
   round(digits=3, attr(
      SummarizedExperiment::assays(GeneSE)$jammanorm_raw_counts, "nf"))

   # the data in "counts" was already normalized
   # so the normalization factors are very near 0 as expected
   round(digits=3,
      attr(SummarizedExperiment::assays(GeneSE)$jammanorm_counts, "nf"))


   # note that housekeeper genes are supplied in params
   # also this demonstrates output_method_prefix
   set.seed(123);
   hkgenes <- sample(rownames(GeneSE), 1000)
   GeneSE <- se_normalize(GeneSE,
      genes=genes,
      samples=samples,
      assay_names=c("raw_counts"),
      method="jammanorm",
      output_method_prefix="hkjammanorm",
      params=list(jammanorm=list(minimum_mean=5,
         controlGenes=hkgenes)))
   SummarizedExperiment::mcols(SummarizedExperiment::assays(GeneSE))

   # example showing quantile normalization
   GeneSE <- se_normalize(GeneSE,
      assay_names=c("raw_counts"),
      method="quantile")
   SummarizedExperiment::mcols(SummarizedExperiment::assays(GeneSE))

   # example showing quantile normalization with custom output_assay_names
   GeneSE <- se_normalize(GeneSE,
      assay_names=c("raw_counts"),
      method="quantile",
      output_assay_names="newquantile_raw_counts")
   SummarizedExperiment::mcols(SummarizedExperiment::assays(GeneSE))
}

jmw86069/jamses documentation built on Nov. 4, 2024, 9:25 p.m.