fast_read: Reading data from an input file
In iq: Protein Quantification in Mass Spectrometry-Based Proteomics

fast_read

R Documentation

Reading data from an input file

Description

A highly efficient reading of a tab-separated text file for iq processing.

Usage

fast_read(filename,
          sample_id = "R.Condition",
          primary_id = "PG.ProteinGroups",
          secondary_id = c("EG.ModifiedSequence", "FG.Charge", "F.FrgIon", "F.Charge"),
          intensity_col = "F.PeakArea",
          annotation_col = c("PG.Genes", "PG.ProteinNames"),
          filter_string_equal = c("F.ExcludedFromQuantification" = "False"),
          filter_string_not_equal = NULL,
          filter_double_less = c("PG.Qvalue" = "0.01", "EG.Qvalue" = "0.01"),
          filter_double_greater = NULL,
          intensity_col_sep = NULL,
          intensity_col_id = NULL,
          na_string = "0")

Arguments

`filename`	A long-format tab-separated text file with a primary column of protein identification, secondary columns of fragment ions, a column of sample names, a column for quantitative intensities, and extra columns for annotation.
`primary_id`	Unique values in this column form the list of proteins to be quantified.
`secondary_id`	A concatenation of these columns determines the fragment ions used for quantification.
`sample_id`	Unique values in this column form the list of samples.
`intensity_col`	The column for intensities.
`annotation_col`	Annotation columns
`filter_string_equal`	A named vector of strings. Only rows satisfying the condition are kept.
`filter_string_not_equal`	A named vector of strings. Only rows satisfying the condition are kept.
`filter_double_less`	A named vector of strings. Only rows satisfying the condition are kept. Default PG.Qvalue < 0.01 and EG.Qvalue < 0.01.
`filter_double_greater`	A named vector of strings. Only rows satisfying the condition are kept.
`intensity_col_sep`	A separator character when entries in the intensity column contain multiple values.
`intensity_col_id`	The column for identities of multiple quantitative values.
`na_string`	The value considered as NA.

Details

When entries in the intensity column contain multiple values, this function will replicate entries in other column and the secondary_id will be appended with corresponding entries in intensity_col_id when it is provided. Otherwise, integer values 1, 2, 3, etc... will be used.

Value

A list is returned with following components

`protein`	A table of proteins in the first column followed by annotation columns.
`sample`	A vector of samples.
`ion`	A vector of fragment ions to be used for quantification.
`quant_table`	A list of four components: protein_list (index pointing to `protein`)), sample_list (index pointing to `sample`), id (index pointing to `ion`), and quant (intensities).

Author(s)

Thang V. Pham

References

Pham TV, Henneman AA, Jimenez CR. iq: an R package to estimate relative protein abundances from ion quantification in DIA-MS-based proteomics. Bioinformatics 2020 Apr 15;36(8):2611-2613.

iq documentation built on April 4, 2025, 2:15 a.m.