SRMRawData: Example dataset from a SRM experiment with stable isotope...
In MSstats: Protein Significance Analysis in DDA, SRM and DIA for Label-free or Label-based Proteomics Experiments

Description Usage Format Details Value Author(s) References Examples

This is a partial data set obtained from a published study (Picotti, et. al, 2009). The experiment targeted 45 proteins in the glycolysis/gluconeogenesis/TCA cycle/glyoxylate cycle network, which spans the range of protein abundance from less than 128 to 10E6 copies per cell. Three biological replicates were analyzed at ten time points (T1-T10), while yeasts transited through exponential growth in a glucose-rich medium (T1-T4), diauxic shift (T5-T6), post-diauxic phase (T7-T9), and stationary phase (T10). Prior to trypsinization, the samples were mixed with an equal amount of proteins from the same N15-labeled yeast sample, which was used as a reference. Each sample was profiled in a single mass spectrometry run, where each protein was represented by up to two peptides and each peptide by up to three transitions. The goal of this study is to detect significantly change in protein abundance across time points. Transcriptional activity under the same experimental conditions has been previously investigated by (DeRisi et. al., 1997). Genes coding for 29 of the proteins are differentially expressed between conditions similar to those represented by T7 and T1 and could be treated as external sources to validate the proteomics analysis. In this exampled data set, two of the targeted proteins are selected and validated with gene expression study: Protein IDHC (gene name IDP2) is differentially expressed in time point 1 and time point 7, whereas, Protein PMG2 (gene name GPM2) is not. The protein names are based on Swiss Prot Name.

1	SRMRawData

data.frame

The raw data (input data for MSstats) is required to contain variable of ProteinName, PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge, IsotopeLabelType, Condition, BioReplicate, Run, Intensity. The variable names should be fixed.

If the information of one or more columns is not available for the original raw data, please retain the column variables and type in fixed value. For example, the original raw data does not contain the information of ProductCharge, we retain the column ProductCharge and type in NA for all transitions in RawData.

The column BioReplicate should label with unique patient ID (i.e., same patients should label with the same ID).

Variable Intensity is required to be original signal without any log transformation and can be specified as the peak of height or the peak of area under curve.

data.frame with the required format of MSstats.

Meena Choi, Olga Vitek.

Maintainer: Meena Choi (mnchoi67@gmail.com)

Ching-Yun Chang, Paola Picotti, Ruth Huttenhain, Viola Heinzelmann-Schwarz, Marko Jovanovic, Ruedi Aebersold, Olga Vitek. Protein significance analysis in selected reaction monitoring (SRM) measurements. Molecular & Cellular Proteomics, 11:M111.014662, 2012.

1	head(SRMRawData)

    ProteinName PeptideSequence PrecursorCharge FragmentIon ProductCharge
243        IDHC   ATDVIVPEEGELR               2          y7            NA
244        IDHC   ATDVIVPEEGELR               2          y7            NA
245        IDHC   ATDVIVPEEGELR               2          y8            NA
246        IDHC   ATDVIVPEEGELR               2          y8            NA
247        IDHC   ATDVIVPEEGELR               2          y9            NA
248        IDHC   ATDVIVPEEGELR               2          y9            NA
    IsotopeLabelType Condition BioReplicate Run   Intensity
243                H         1        ReplA   1 84361.08350
244                L         1        ReplA   1   215.13526
245                H         1        ReplA   1 29778.10188
246                L         1        ReplA   1    98.02134
247                H         1        ReplA   1 17921.29255
248                L         1        ReplA   1    60.47029