preprocess: Data preprocessing for protein quantification

View source: R/iq.R

preprocessR Documentation

Data preprocessing for protein quantification

Description

Prepares a long-format input including removing low-intensity ions and performing median normalization.

Usage

preprocess(quant_table,
           primary_id = "PG.ProteinGroups",
           secondary_id = c("EG.ModifiedSequence", "FG.Charge", "F.FrgIon", "F.Charge"),
           sample_id = "R.Condition",
           intensity_col = "F.PeakArea",
           median_normalization = TRUE,
           log2_intensity_cutoff = 0,
           pdf_out = "qc-plots.pdf",
           pdf_width = 12,
           pdf_height = 8,
           intensity_col_sep = NULL,
           intensity_col_id = NULL,
           na_string = "0")

Arguments

quant_table

A long-format table with a primary column of protein identification, secondary columns of fragment ions, a column of sample names, and a column for quantitative intensities.

primary_id

Unique values in this column form the list of proteins to be quantified.

secondary_id

A concatenation of these columns determines the fragment ions used for quantification.

sample_id

Unique values in this column form the list of samples.

intensity_col

The column for intensities.

median_normalization

A logical value. The default TRUE value is to perform median normalization.

log2_intensity_cutoff

Entries lower than this value in log2 space are ignored. Plot a histogram of all intensities to set this parameter.

pdf_out

A character string specifying the name of the PDF output. A NULL value will suppress the PDF output.

pdf_width

Width of the pdf output in inches.

pdf_height

Height of the pdf output in inches.

intensity_col_sep

A separator character when entries in the intensity column contain multiple values.

intensity_col_id

The column for identities of multiple quantitative values.

na_string

The value considered as NA.

Details

When entries in the intensity column contain multiple values, this function will replicate entries in other column and the secondary_id will be appended with corresponding entries in intensity_col_id when it is provided. Otherwise, integer values 1, 2, 3, etc... will be used.

Value

A data frame is returned with following components

protein_list

A vector of proteins.

sample_list

A vector of samples.

id

A vector of fragment ions to be used for quantification.

quant

A vector of log2 intensities.

Author(s)

Thang V. Pham

References

Pham TV, Henneman AA, Jimenez CR. iq: an R package to estimate relative protein abundances from ion quantification in DIA-MS-based proteomics. Bioinformatics 2020 Apr 15;36(8):2611-2613.

Examples


data("spikeins")
head(spikeins)
# This example set of spike-in proteins has been 'median-normalized'.
norm_data <- iq::preprocess(spikeins, median_normalization = FALSE, pdf_out = NULL)


iq documentation built on May 29, 2024, 8:40 a.m.