normalize: Data normalization methods

Description Usage Arguments Value Examples

Description

Normalization of raw (usually log-transformed) data. Normalization brings the samples to the same scale. Currently the following normalization functions are implemented: #'

  1. Quantile normalization: 'quantile_normalize_dm()'. Quantile normalization of the data.

  2. Median normalization: 'normalize_sample_medians_dm()'. Normalization by centering sample medians to global median of the data

Alternatively, one can call normalization function with 'normalize_data_dm()' wrapper.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
quantile_normalize_dm(data_matrix)

quantile_normalize_df(
  df_long,
  feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName",
  measure_col = "Intensity",
  no_fit_imputed = TRUE,
  qual_col = NULL,
  qual_value = 2,
  keep_all = "default"
)

normalize_sample_medians_dm(data_matrix)

normalize_sample_medians_df(
  df_long,
  feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName",
  measure_col = "Intensity",
  no_fit_imputed = FALSE,
  qual_col = NULL,
  qual_value = 2,
  keep_all = "default"
)

normalize_data_dm(
  data_matrix,
  normalize_func = c("quantile", "medianCentering"),
  log_base = NULL,
  offset = 1
)

normalize_data_df(
  df_long,
  normalize_func = c("quantile", "medianCentering"),
  log_base = NULL,
  offset = 1,
  feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName",
  measure_col = "Intensity",
  no_fit_imputed = TRUE,
  qual_col = NULL,
  qual_value = 2,
  keep_all = "default"
)

Arguments

data_matrix

features (in rows) vs samples (in columns) matrix, with feature IDs in rownames and file/sample names as colnames. See "example_proteome_matrix" for more details (to call the description, use help("example_proteome_matrix"))

df_long

data frame where each row is a single feature in a single sample. It minimally has a sample_id_col, a feature_id_col and a measure_col, but usually also an m_score (in OpenSWATH output result file). See help("example_proteome") for more details.

feature_id_col

name of the column with feature/gene/peptide/protein ID used in the long format representation df_long. In the wide formatted representation data_matrix this corresponds to the row names.

sample_id_col

name of the column in sample_annotation table, where the filenames (colnames of the data_matrix are found).

measure_col

if df_long is among the parameters, it is the column with expression/abundance/intensity; otherwise, it is used internally for consistency.

no_fit_imputed

(logical) whether to use imputed (requant) values, as flagged in qual_col by qual_value for data transformation

qual_col

column to color point by certain value denoted by color_by_qual_value. Design with inferred/requant values in OpenSWATH output data, which means argument value has to be set to m_score.

qual_value

value in qual_col to color. For OpenSWATH data, this argument value has to be set to 2 (this is an m_score value for imputed values (requant values).

keep_all

when transforming the data (normalize, correct) - acceptable values: all/default/minimal (which set of columns be kept).

normalize_func

global batch normalization method ('quantile' or 'MedianCentering')

log_base

whether to log transform data matrix before normalization (e.g. 'NULL', '2' or '10')

offset

small positive number to prevent 0 conversion to -Inf

Value

the data in the same format as input (data_matrix or df_long). For df_long the data frame stores the original values of measure_col in another column called "preNorm_intensity" if "intensity", and the normalized values in measure_col column.

Examples

1
2
3
4
5
6
7
8
9
#Quantile normalization:
quantile_normalized_matrix <- quantile_normalize_dm(example_proteome_matrix)

#Median centering:
median_normalized_df <- normalize_sample_medians_df(example_proteome)

#Transform the data in one go:
quantile_normalized_matrix <- normalize_data_dm(example_proteome_matrix, 
normalize_func = "quantile", log_base = 2, offset = 1)

proBatch documentation built on Nov. 8, 2020, 4:55 p.m.