fast_read: Reading data from an input file

View source: R/iq-fast.R

fast_readR Documentation

Reading data from an input file

Description

A highly efficient reading of a tab-separated text file for iq processing.

Usage

fast_read(filename,
          sample_id = "R.Condition",
          primary_id = "PG.ProteinGroups",
          secondary_id = c("EG.ModifiedSequence", "FG.Charge", "F.FrgIon", "F.Charge"),
          intensity_col = "F.PeakArea",
          annotation_col = c("PG.Genes", "PG.ProteinNames"),
          filter_string_equal = c("F.ExcludedFromQuantification" = "False"),
          filter_string_not_equal = NULL,
          filter_double_less = c("PG.Qvalue" = "0.01", "EG.Qvalue" = "0.01"),
          filter_double_greater = NULL,
          intensity_col_sep = NULL,
          intensity_col_id = NULL,
          na_string = "0")

Arguments

filename

A long-format tab-separated text file with a primary column of protein identification, secondary columns of fragment ions, a column of sample names, a column for quantitative intensities, and extra columns for annotation.

primary_id

Unique values in this column form the list of proteins to be quantified.

secondary_id

A concatenation of these columns determines the fragment ions used for quantification.

sample_id

Unique values in this column form the list of samples.

intensity_col

The column for intensities.

annotation_col

Annotation columns

filter_string_equal

A named vector of strings. Only rows satisfying the condition are kept.

filter_string_not_equal

A named vector of strings. Only rows satisfying the condition are kept.

filter_double_less

A named vector of strings. Only rows satisfying the condition are kept. Default PG.Qvalue < 0.01 and EG.Qvalue < 0.01.

filter_double_greater

A named vector of strings. Only rows satisfying the condition are kept.

intensity_col_sep

A separator character when entries in the intensity column contain multiple values.

intensity_col_id

The column for identities of multiple quantitative values.

na_string

The value considered as NA.

Details

When entries in the intensity column contain multiple values, this function will replicate entries in other column and the secondary_id will be appended with corresponding entries in intensity_col_id when it is provided. Otherwise, integer values 1, 2, 3, etc... will be used.

Value

A list is returned with following components

protein

A table of proteins in the first column followed by annotation columns.

sample

A vector of samples.

ion

A vector of fragment ions to be used for quantification.

quant_table

A list of four components: protein_list (index pointing to protein)), sample_list (index pointing to sample), id (index pointing to ion), and quant (intensities).

Author(s)

Thang V. Pham

References

Pham TV, Henneman AA, Jimenez CR. iq: an R package to estimate relative protein abundances from ion quantification in DIA-MS-based proteomics. Bioinformatics 2020 Apr 15;36(8):2611-2613.


iq documentation built on May 29, 2024, 8:40 a.m.