correct_batch_effects: Batch correction of normalized data
In symbioticMe/proBatch: Tools for Diagnostics and Corrections of Batch Effects in Proteomics

correct_batch_effects

R Documentation

Batch correction of normalized data

Description

Batch correction of normalized data. Batch correction brings each feature in each batch to the comparable shape. Currently the following batch correction functions are implemented:

Per-feature median centering: center_feature_batch_medians_df(). Median centering of the features (per batch median).
correction with ComBat: correct_with_ComBat_df(). Adjusts for discrete batch effects using ComBat. ComBat, described in Johnson et al. 2007. It uses either parametric or non-parametric empirical Bayes frameworks for adjusting data for batch effects. Users are returned an expression matrix that has been corrected for batch effects. The input data are assumed to be free of missing values and normalized before batch effect removal. Please note that missing values are common in proteomics, which is why in some cases corrections like center_peptide_batch_medians_df are more appropriate.
Continuous drift correction: adjust_batch_trend_df(). Adjust batch signal trend with the custom (continuous) fit. Should be followed by discrete corrections, e.g. center_feature_batch_medians_df() or correct_with_ComBat_df().

Alternatively, one can call the correction function with correct_batch_effects_df() wrapper. Batch correction method allows correction of continuous signal drift within batch (if required) and adjustment for discrete difference across batches.

Usage

center_feature_batch_medians_df(df_long, sample_annotation = NULL,
  sample_id_col = "FullRunName", batch_col = "MS_batch",
  feature_id_col = "peptide_group_label", measure_col = "Intensity",
  keep_all = "default", no_fit_imputed = TRUE, qual_col = NULL,
  qual_value = NULL)

center_feature_batch_medians_dm(data_matrix, sample_annotation,
  sample_id_col = "FullRunName", batch_col = "MS_batch",
  feature_id_col = "peptide_group_label", measure_col = "Intensity")

center_feature_batch_means_df(df_long, sample_annotation = NULL,
  sample_id_col = "FullRunName", batch_col = "MS_batch",
  feature_id_col = "peptide_group_label", measure_col = "Intensity",
  keep_all = "default", no_fit_imputed = TRUE, qual_col = NULL,
  qual_value = NULL)

center_feature_batch_means_dm(data_matrix, sample_annotation,
  sample_id_col = "FullRunName", batch_col = "MS_batch",
  feature_id_col = "peptide_group_label", measure_col = "Intensity")

adjust_batch_trend_df(df_long, sample_annotation = NULL,
  batch_col = "MS_batch", feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName", measure_col = "Intensity",
  order_col = "order", keep_all = "default",
  fit_func = "loess_regression", no_fit_imputed = TRUE,
  qual_col = NULL, qual_value = NULL, min_measurements = 8, ...)

adjust_batch_trend_dm(data_matrix, sample_annotation,
  batch_col = "MS_batch", feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName", measure_col = "Intensity",
  order_col = "order", fit_func = "loess_regression",
  return_fit_df = TRUE, min_measurements = 8, ...)

correct_with_ComBat_df(df_long, sample_annotation = NULL,
  feature_id_col = "peptide_group_label", measure_col = "Intensity",
  sample_id_col = "FullRunName", batch_col = "MS_batch",
  par.prior = TRUE, no_fit_imputed = TRUE, qual_col = NULL,
  qual_value = NULL, keep_all = "default")

correct_with_ComBat_dm(data_matrix, sample_annotation = NULL,
  feature_id_col = "peptide_group_label", measure_col = "Intensity",
  sample_id_col = "FullRunName", batch_col = "MS_batch",
  par.prior = TRUE)

correct_batch_effects_df(df_long, sample_annotation,
  continuous_func = NULL, discrete_func = c("MedianCentering",
  "MeanCentering", "ComBat"), batch_col = "MS_batch",
  feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName", measure_col = "Intensity",
  order_col = "order", keep_all = "default", no_fit_imputed = TRUE,
  qual_col = NULL, qual_value = NULL, min_measurements = 8, ...)

correct_batch_effects_dm(data_matrix, sample_annotation,
  continuous_func = NULL, discrete_func = c("MedianCentering",
  "ComBat"), batch_col = "MS_batch",
  feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName", measure_col = "Intensity",
  order_col = "order", min_measurements = 8, ...)

Arguments

`df_long`	data frame where each row is a single feature in a single sample. It minimally has a `sample_id_col`, a `feature_id_col` and a `measure_col`, but usually also an `m_score` (in OpenSWATH output result file). See `help("example_proteome")` for more details.
`sample_annotation`	data frame with: `sample_id_col` (this can be repeated as row names) biological covariates technical covariates (batches etc) . See `help("example_sample_annotation")`
`sample_id_col`	name of the column in `sample_annotation` table, where the filenames (colnames of the `data_matrix` are found).
`batch_col`	column in `sample_annotation` that should be used for batch comparison (or other, non-batch factor to be mapped to color in plots).
`feature_id_col`	name of the column with feature/gene/peptide/protein ID used in the long format representation `df_long`. In the wide formatted representation `data_matrix` this corresponds to the row names.
`measure_col`	if `df_long` is among the parameters, it is the column with expression/abundance/intensity; otherwise, it is used internally for consistency.
`keep_all`	when transforming the data (normalize, correct) - acceptable values: all/default/minimal (which set of columns be kept).
`no_fit_imputed`	(logical) whether to use imputed (requant) values, as flagged in `qual_col` by `qual_value` for data transformation
`qual_col`	column to color point by certain value denoted by `qual_value`. Design with inferred/requant values in OpenSWATH output data, which means argument value has to be set to `m_score`.
`qual_value`	value in `qual_col` to color. For OpenSWATH data, this argument value has to be set to `2` (this is an `m_score` value for imputed values (requant values).
`data_matrix`	features (in rows) vs samples (in columns) matrix, with feature IDs in rownames and file/sample names as colnames. See "example_proteome_matrix" for more details (to call the description, use `help("example_proteome_matrix")`)
`order_col`	column in `sample_annotation` that determines sample order. It is used for in initial assessment plots (plot_sample_mean_or_boxplot) and feature-level diagnostics (feature_level_diagnostics). Can be 'NULL' if sample order is irrelevant (e.g. in genomic experiments). For more details, order definition/inference, see define_sample_order and date_to_sample_order
`fit_func`	function to fit the (non)-linear trend
`min_measurements`	the number of samples in a batch required for curve fitting.
`...`	other parameters, usually of `adjust_batch_trend`, and `fit_func`.
`return_fit_df`	(logical) whether to return the `fit_df` from `adjust_batch_trend_dm` or only the data matrix
`par.prior`	use parametrical or non-parametrical prior
`continuous_func`	function to use for the fit (currently only `loess_regression` available); if order-associated fix is not required, should be `NULL`.
`discrete_func`	function to use for adjustment of discrete batch effects (`MedianCentering` or `ComBat`).

Value

the data in the same format as input (data_matrix or df_long). For df_long the data frame stores the original values of measure_col in another column called "preBatchCorr_[measure_col]", and the normalized values in measure_col column.

The function adjust_batch_trend_dm(), if return_fit_df is TRUE returns list of two items:

data_matrix
fit_df, used to examine the fitting curves

Examples


#Median centering per feature per batch:
median_centered_df <- center_feature_batch_medians_df(
example_proteome, example_sample_annotation)

#Correct with ComBat: 
combat_corrected_df <- correct_with_ComBat_df(example_proteome, 
example_sample_annotation)

#Adjust the MS signal drift:
test_peptides = unique(example_proteome$peptide_group_label)[1:3]
test_peptide_filter = example_proteome$peptide_group_label %in% test_peptides
test_proteome = example_proteome[test_peptide_filter,]
adjusted_df <- adjust_batch_trend_df(test_proteome, 
example_sample_annotation, span = 0.7, 
min_measurements = 8)
plot_fit <- plot_with_fitting_curve(unique(adjusted_df$peptide_group_label), 
df_long = adjusted_df, measure_col = 'preTrendFit_Intensity',
fit_df = adjusted_df, sample_annotation = example_sample_annotation)

#Correct the data in one go:
batch_corrected_matrix <- correct_batch_effects_df(example_proteome, 
example_sample_annotation, 
continuous_func = 'loess_regression',
discrete_func = 'MedianCentering', 
batch_col = 'MS_batch',  
span = 0.7, min_measurements = 8)

symbioticMe/proBatch documentation built on April 9, 2023, 11:59 a.m.

symbioticMe/proBatch index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

symbioticMe/proBatch
Tools for Diagnostics and Corrections of Batch Effects in Proteomics

correct_batch_effects: Batch correction of normalized data
In symbioticMe/proBatch: Tools for Diagnostics and Corrections of Batch Effects in Proteomics

Batch correction of normalized data

Description

Usage

Arguments

Value

See Also

Examples

Related to correct_batch_effects in symbioticMe/proBatch...

R Package Documentation

Browse R Packages

We want your feedback!

symbioticMe/proBatch Tools for Diagnostics and Corrections of Batch Effects in Proteomics

correct_batch_effects: Batch correction of normalized data In symbioticMe/proBatch: Tools for Diagnostics and Corrections of Batch Effects in Proteomics

Batch correction of normalized data

Description

Usage

Arguments

Value

See Also

Examples

Related to correct_batch_effects in symbioticMe/proBatch...

R Package Documentation

Browse R Packages

We want your feedback!

symbioticMe/proBatch
Tools for Diagnostics and Corrections of Batch Effects in Proteomics

correct_batch_effects: Batch correction of normalized data
In symbioticMe/proBatch: Tools for Diagnostics and Corrections of Batch Effects in Proteomics