normalize: Data normalization methods

normalizeR Documentation

Data normalization methods

Description

Normalization of raw (usually log-transformed) data. Normalization brings the samples to the same scale. Currently the following normalization functions are implemented: #'

  1. Quantile normalization: 'quantile_normalize_dm()'. Quantile normalization of the data.

  2. Median normalization: 'normalize_sample_medians_dm()'. Normalization by centering sample medians to global median of the data

Alternatively, one can call normalization function with 'normalize_data_dm()' wrapper.

Usage

quantile_normalize_dm(data_matrix)

quantile_normalize_df(df_long, feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName", measure_col = "Intensity",
  no_fit_imputed = TRUE, qual_col = NULL, qual_value = 2,
  keep_all = "default")

normalize_sample_medians_dm(data_matrix)

normalize_sample_medians_df(df_long,
  feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName", measure_col = "Intensity",
  no_fit_imputed = FALSE, qual_col = NULL, qual_value = 2,
  keep_all = "default")

normalize_data_dm(data_matrix, normalize_func = c("quantile",
  "medianCentering"), log_base = NULL, offset = 1)

normalize_data_df(df_long, normalize_func = c("quantile",
  "medianCentering"), log_base = NULL, offset = 1,
  feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName", measure_col = "Intensity",
  no_fit_imputed = TRUE, qual_col = NULL, qual_value = 2,
  keep_all = "default")

Arguments

data_matrix

features (in rows) vs samples (in columns) matrix, with feature IDs in rownames and file/sample names as colnames. See "example_proteome_matrix" for more details (to call the description, use help("example_proteome_matrix"))

df_long

data frame where each row is a single feature in a single sample. It minimally has a sample_id_col, a feature_id_col and a measure_col, but usually also an m_score (in OpenSWATH output result file). See help("example_proteome") for more details.

feature_id_col

name of the column with feature/gene/peptide/protein ID used in the long format representation df_long. In the wide formatted representation data_matrix this corresponds to the row names.

sample_id_col

name of the column in sample_annotation table, where the filenames (colnames of the data_matrix are found).

measure_col

if df_long is among the parameters, it is the column with expression/abundance/intensity; otherwise, it is used internally for consistency.

no_fit_imputed

(logical) whether to use imputed (requant) values, as flagged in qual_col by qual_value for data transformation

qual_col

column to color point by certain value denoted by qual_value. Design with inferred/requant values in OpenSWATH output data, which means argument value has to be set to m_score.

qual_value

value in qual_col to color. For OpenSWATH data, this argument value has to be set to 2 (this is an m_score value for imputed values (requant values).

keep_all

when transforming the data (normalize, correct) - acceptable values: all/default/minimal (which set of columns be kept).

normalize_func

global batch normalization method ('quantile' or 'MedianCentering')

log_base

whether to log transform data matrix before normalization (e.g. 'NULL', '2' or '10')

offset

small positive number to prevent 0 conversion to -Inf

Value

the data in the same format as input (data_matrix or df_long). For df_long the data frame stores the original values of measure_col in another column called "preNorm_intensity" if "intensity", and the normalized values in measure_col column.

Examples


#Quantile normalization:
quantile_normalized_matrix <- quantile_normalize_dm(example_proteome_matrix)

#Median centering:
median_normalized_df <- normalize_sample_medians_df(example_proteome)

#Transform the data in one go:
quantile_normalized_matrix <- normalize_data_dm(example_proteome_matrix, 
normalize_func = "quantile", log_base = 2, offset = 1)


symbioticMe/proBatch documentation built on April 9, 2023, 11:59 a.m.