Example R code showing the usage of the SWATH2stats package. The data processed is the publicly available dataset of S.pyogenes (Röst et al. 2014) (http://www.peptideatlas.org/PASS/PASS00289). The results file 'rawOpenSwathResults_1pcnt_only.tsv' can be found on PeptideAtlas (ftp://PASS00289@ftp.peptideatlas.org/../Spyogenes/results/). This is a R Markdown file, showing the result of processing this data. The lines shaded in grey represent the R code executed during this analysis.

The SWATH2stats package can be directly installed from Bioconductor using the commands below (http://bioconductor.org/packages/devel/bioc/html/SWATH2stats.html).

if (!require("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("SWATH2stats")

Part 1: Loading and annotation

Load the SWATH-MS example data from the package, this is a reduced file in order to limit the file size of the package.

library(SWATH2stats)
library(data.table)
data('Spyogenes', package = 'SWATH2stats')

Alternatively the original file downloaded from the Peptide Atlas can be loaded from the working directory.

data <- data.frame(fread('rawOpenSwathResults_1pcnt_only.tsv', sep='\t', header=TRUE))

Extract the study design information from the file names. Alternatively, the study design table can be provided as an external table.

Study_design <- data.frame(Filename = unique(data$align_origfilename))
Study_design$Filename <- gsub('.*strep_align/(.*)_all_peakgroups.*', '\\1', Study_design$Filename)
Study_design$Condition <- gsub('(Strep.*)_Repl.*', '\\1', Study_design$Filename)
Study_design$BioReplicate <- gsub('.*Repl([[:digit:]])_.*', '\\1', Study_design$Filename)
Study_design$Run <- seq_len(nrow(Study_design))
head(Study_design)

The SWATH-MS data is annotated using the study design table.

data.annotated <- sample_annotation(data, Study_design, column_file = "align_origfilename")

Remove the decoy peptides for a subsequent inspection of the data.

data.annotated.nodecoy <- subset(data.annotated, decoy==FALSE)

\newpage

Part 2: Analyze correlation, variation and signal

Count the different analytes for the different injections.

count_analytes(data.annotated.nodecoy)

Plot the correlation of the signal intensity.

correlation <- plot_correlation_between_samples(data.annotated.nodecoy, column.values = 'Intensity')

Plot the correlation of the delta_rt, which is the deviation of the retention time from the expected retention time.

correlation <- plot_correlation_between_samples(data.annotated.nodecoy, column.values = 'delta_rt')

\newpage

Plot the variation of the signal across replicates.

variation <- plot_variation(data.annotated.nodecoy)
variation[[2]]

Plot the total variation versus variation within replicates.

variation_total <- plot_variation_vs_total(data.annotated.nodecoy)
variation_total[[2]]

Calculate the summed signal per peptide and protein across samples.

peptide_signal <- write_matrix_peptides(data.annotated.nodecoy)
protein_signal <- write_matrix_proteins(data.annotated.nodecoy)
head(protein_signal)

\newpage

Part 3: FDR estimation

Estimate the overall FDR across runs using a target decoy strategy.

par(mfrow = c(1, 3))
fdr_target_decoy <- assess_fdr_overall(data.annotated, n_range = 10, 
                                       FFT = 0.25, output = 'Rconsole')

According to this FDR estimation one would need to filter the data with a lower mscore threshold to reach an overall protein FDR of 5%.

mscore4protfdr(data, FFT = 0.25, fdr_target = 0.05)

Part 4: Filtering

Filter data for values that pass the 0.001 mscore criteria in at least two replicates of one condition.

data.filtered <- filter_mscore_condition(data.annotated, 0.001, n_replica = 2)

Select only the 10 peptides showing strongest signal per protein.

data.filtered2 <- filter_on_max_peptides(data.filtered, n_peptides = 10)

\newpage

Filter for proteins that are supported by at least two peptides.

data.filtered3 <- filter_on_min_peptides(data.filtered2, n_peptides = 2)

Part 5: Conversion

Convert the data into a transition-level format (one row per transition measured).

data.transition <- disaggregate(data.filtered3)

Convert the data into the format required by MSstats.

MSstats.input <- convert4MSstats(data.transition)
head(MSstats.input)

Convert the data into the format required by mapDIA.

mapDIA.input <- convert4mapDIA(data.transition)
head(mapDIA.input)

Convert the data into the format required by aLFQ.

aLFQ.input <- convert4aLFQ(data.transition)
head(aLFQ.input)

Session info on the R version and packages used.

sessionInfo()


peterblattmann/SWATH2stats documentation built on July 2, 2023, 9:42 p.m.