normalize: Data normalization methods

normalizeR Documentation

Data normalization methods


Normalization of raw (usually log-transformed) data. Normalization brings the samples to the same scale. Currently the following normalization functions are implemented: #'

  1. Quantile normalization: 'quantile_normalize_dm()'. Quantile normalization of the data.

  2. Median normalization: 'normalize_sample_medians_dm()'. Normalization by centering sample medians to global median of the data

Alternatively, one can call normalization function with 'normalize_data_dm()' wrapper.



quantile_normalize_df(df_long, feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName", measure_col = "Intensity",
  no_fit_imputed = TRUE, qual_col = NULL, qual_value = 2,
  keep_all = "default")


  feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName", measure_col = "Intensity",
  no_fit_imputed = FALSE, qual_col = NULL, qual_value = 2,
  keep_all = "default")

normalize_data_dm(data_matrix, normalize_func = c("quantile",
  "medianCentering"), log_base = NULL, offset = 1)

normalize_data_df(df_long, normalize_func = c("quantile",
  "medianCentering"), log_base = NULL, offset = 1,
  feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName", measure_col = "Intensity",
  no_fit_imputed = TRUE, qual_col = NULL, qual_value = 2,
  keep_all = "default")



features (in rows) vs samples (in columns) matrix, with feature IDs in rownames and file/sample names as colnames. See "example_proteome_matrix" for more details (to call the description, use help("example_proteome_matrix"))


data frame where each row is a single feature in a single sample. It minimally has a sample_id_col, a feature_id_col and a measure_col, but usually also an m_score (in OpenSWATH output result file). See help("example_proteome") for more details.


name of the column with feature/gene/peptide/protein ID used in the long format representation df_long. In the wide formatted representation data_matrix this corresponds to the row names.


name of the column in sample_annotation table, where the filenames (colnames of the data_matrix are found).


if df_long is among the parameters, it is the column with expression/abundance/intensity; otherwise, it is used internally for consistency.


(logical) whether to use imputed (requant) values, as flagged in qual_col by qual_value for data transformation


column to color point by certain value denoted by qual_value. Design with inferred/requant values in OpenSWATH output data, which means argument value has to be set to m_score.


value in qual_col to color. For OpenSWATH data, this argument value has to be set to 2 (this is an m_score value for imputed values (requant values).


when transforming the data (normalize, correct) - acceptable values: all/default/minimal (which set of columns be kept).


global batch normalization method ('quantile' or 'MedianCentering')


whether to log transform data matrix before normalization (e.g. 'NULL', '2' or '10')


small positive number to prevent 0 conversion to -Inf


the data in the same format as input (data_matrix or df_long). For df_long the data frame stores the original values of measure_col in another column called "preNorm_intensity" if "intensity", and the normalized values in measure_col column.


#Quantile normalization:
quantile_normalized_matrix <- quantile_normalize_dm(example_proteome_matrix)

#Median centering:
median_normalized_df <- normalize_sample_medians_df(example_proteome)

#Transform the data in one go:
quantile_normalized_matrix <- normalize_data_dm(example_proteome_matrix, 
normalize_func = "quantile", log_base = 2, offset = 1)

symbioticMe/proBatch documentation built on April 9, 2023, 11:59 a.m.