proBatch: proBatch: A package for diagnostics and correction of batch...
In proBatch: Tools for Diagnostics and Corrections of Batch Effects in Proteomics

Description Arguments Details Section

The proBatch package contains functions for analyzing and correcting batch effects (unwanted technical variation) from high-thoughput experiments. Although the package has primarily been developed for mass spectrometry proteomics (DIA/SWATH), it has been designed be applicable to most omic data with minor adaptations. It addresses the following needs:

prepare the data for analysis
Visualize batch effects in sample-wide and feature-level;
Normalize and correct for batch effects.

`df_long`	data frame where each row is a single feature in a single sample. It minimally has a `sample_id_col`, a `feature_id_col` and a `measure_col`, but usually also an `m_score` (in OpenSWATH output result file). See `help("example_proteome")` for more details.
`data_matrix`	features (in rows) vs samples (in columns) matrix, with feature IDs in rownames and file/sample names as colnames. See "example_proteome_matrix" for more details (to call the description, use `help("example_proteome_matrix")`)
`sample_annotation`	data frame with: `sample_id_col` (this can be repeated as row names) biological covariates technical covariates (batches etc) . See `help("example_sample_annotation")`
`sample_id_col`	name of the column in `sample_annotation` table, where the filenames (colnames of the `data_matrix` are found).
`measure_col`	if `df_long` is among the parameters, it is the column with expression/abundance/intensity; otherwise, it is used internally for consistency.
`feature_id_col`	name of the column with feature/gene/peptide/protein ID used in the long format representation `df_long`. In the wide formatted representation `data_matrix` this corresponds to the row names.
`batch_col`	column in `sample_annotation` that should be used for batch comparison (or other, non-batch factor to be mapped to color in plots).
`order_col`	column in `sample_annotation` that determines sample order. It is used for in initial assessment plots (plot_sample_mean_or_boxplot) and feature-level diagnostics (feature_level_diagnostics). Can be 'NULL' if sample order is irrelevant (e.g. in genomic experiments). For more details, order definition/inference, see define_sample_order and date_to_sample_order
`facet_col`	column in `sample_annotation` with a batch factor to separate plots into facets; usually 2nd to `batch_col`. Most meaningful for multi-instrument MS experiments (where each instrument has its own order-associated effects (see `order_col`) or simultaneous examination of two batch factors (e.g. preparation day and measurement day). For single-instrument case should be set to 'NULL'
`color_by_batch`	(logical) whether to color points and connecting lines by batch factor as defined by `batch_col`.
`peptide_annotation`	long format data frame with peptide ID and their corresponding protein and/or gene annotations. See `help("example_peptide_annotation")`.
`color_scheme`	a named vector of colors to map to `batch_col`, names corresponding to the levels of the factor. For continuous variables, vector doesn't need to be named.
`color_list`	list, as returned by `sample_annotation_to_colors`, where each item contains a color vector for each factor to be mapped to the color.
`factors_to_plot`	vector of technical and biological covariates to be plotted in this diagnostic plot (assumed to be present in `sample_annotation`)
`protein_col`	column where protein names are specified
`no_fit_imputed`	(logical) whether to use imputed (requant) values, as flagged in `qual_col` by `qual_value` for data transformation
`qual_col`	column to color point by certain value denoted by `color_by_qual_value`. Design with inferred/requant values in OpenSWATH output data, which means argument value has to be set to `m_score`.
`qual_value`	value in `qual_col` to color. For OpenSWATH data, this argument value has to be set to `2` (this is an `m_score` value for imputed values (requant values).
`plot_title`	title of the plot (e.g., processing step + representation level (fragments, transitions, proteins) + purpose (meanplot/corrplot etc))
`keep_all`	when transforming the data (normalize, correct) - acceptable values: all/default/minimal (which set of columns be kept).
`theme`	ggplot theme, by default `classic`. Can be easily overriden
`filename`	path where the results are saved. If null the object is returned to the active window; otherwise, the object is save into the file. Currently only pdf and png format is supported
`width`	option determining the output image width
`height`	option determining the output image width
`units`	units: 'cm', 'in' or 'mm'