preprocess: Data preprocessing for protein quantification

View source: R/iq.R

preprocessR Documentation

Data preprocessing for protein quantification

Description

Prepares a long-format input including removing low-intensity ions and performing median normalization.

Usage

preprocess(quant_table,
           primary_id = "PG.ProteinGroups",
           secondary_id = c("EG.ModifiedSequence", "FG.Charge", "F.FrgIon", "F.Charge"),
           sample_id = "R.Condition",
           intensity_col = "F.PeakArea",
           median_normalization = TRUE,
           log2_intensity_cutoff = 0,
           pdf_out = "qc-plots.pdf",
           pdf_width = 12,
           pdf_height = 8,
           intensity_col_sep = NULL,
           intensity_col_id = NULL,
           na_string = "0",
           show_boxplot = TRUE)

Arguments

quant_table

A long-format table with a primary column of protein identification, secondary columns of fragment ions, a column of sample names, and a column for quantitative intensities.

primary_id

Unique values in this column form the list of proteins to be quantified.

secondary_id

A concatenation of these columns determines the fragment ions used for quantification.

sample_id

Unique values in this column form the list of samples.

intensity_col

The column for intensities.

median_normalization

A logical value. The default TRUE value is to perform median normalization.

log2_intensity_cutoff

Entries lower than this value in log2 space are ignored. Plot a histogram of all intensities to set this parameter.

pdf_out

A character string specifying the name of the PDF output. A NULL value will suppress the PDF output.

pdf_width

Width of the pdf output in inches.

pdf_height

Height of the pdf output in inches.

intensity_col_sep

A separator character when entries in the intensity column contain multiple values.

intensity_col_id

The column for identities of multiple quantitative values.

na_string

The value considered as NA.

show_boxplot

A logical value. The default TRUE value is to create boxplots of fragment intensities for each sample.

Details

When entries in the intensity column contain multiple values, this function will replicate entries in other column and the secondary_id will be appended with corresponding entries in intensity_col_id when it is provided. Otherwise, integer values 1, 2, 3, etc... will be used.

Value

A data frame is returned with following components

protein_list

A vector of proteins.

sample_list

A vector of samples.

id

A vector of fragment ions to be used for quantification.

quant

A vector of log2 intensities.

Author(s)

Thang V. Pham

References

Pham TV, Henneman AA, Jimenez CR. iq: an R package to estimate relative protein abundances from ion quantification in DIA-MS-based proteomics. Bioinformatics 2020 Apr 15;36(8):2611-2613.

Examples


data("spikeins")
head(spikeins)
# This example set of spike-in proteins has been 'median-normalized'.
norm_data <- iq::preprocess(spikeins, median_normalization = FALSE, pdf_out = NULL)


iq documentation built on April 4, 2025, 2:15 a.m.