Description Usage Arguments Details Value Note Author(s) References Examples
A complete workflow for the identification of true interaction proteins based on AP-MS data, embedding the scoring method SAINT into a pre- and postprocessing framework.
1 2 3 4 5 6 7 |
file_baittable |
a character string specifying the pathname of the baittable. see Details. |
file_inttable |
a character string specifying the pathname of the interaction table. see Details. |
prottable |
a character string specifying the pathname of the protein table. see Details. |
norm |
method to normalize the data. If |
Filter |
logical value, whether filtering of the data is applied (Default |
filter.method |
method to use for filtering, must be one of |
var.cutoff |
percentile (between 0 and 1) or |
limit |
minimal number of expected true interaction proteins in the data. |
intern.norm |
logical value. If |
saint.options |
parameters set for SAINT. |
The input files correspond to the input formats used by SAINT: the baittable, prey- and interaction table in the form of tab-delimited files.
The baittable consists of three columns: IP name, bait or control name, indicator for bait and control experiment (T=bait purification, C=control).
The interaction table consists of four columns: IP name, bait or control name, protein name, spectral count (note: a protein which was not detected in one of the samples receives a zero count).
The protein table refers to the preyfile, it consists of three columns: protein names, protein length, protein names or associated gene names (if available).
A more detailed description on the generation of these files is given in Choi et.al. (Current Protocols in Bioinformatics 2012).
Pre-processing comprises normalization and filtering of the data:
Here, it can be chosen from five different normalization methods, adapted from microarray and RNA-seq analysis to AP-MS data. For further details see norm.inttable
.
The filter consists of a biological filter and a statistical variance filter and aims to remove obvious contaminants from further analysis.
If filter.method="noVar"
, only the biological filter is conducted.
Both are conducted, if filter.method="IQR"
, here the variance is calculated by the inter-quartile-range, or if filter.method="overallVar"
, here the variance is calculated across all samples.
The var.cutoff
defines the fraction of proteins with the lowest overall variance, which are considered as contaminants and are removed.
var.cutoff=NA
refers to a cutoff defined by the mean of the shortest intervall containing 50% of the data (default). Alternatively, a quantile can be set as cutoff, e.g. a cutoff of 0.5 filters 50% of the data showing the smallest overall variance or IQR. see also varFilter
The parameter limit
assures, that filtering results in a number of proteins above the number of expected true interaction proteins.
The corresponding parameters in SAINT [nburn][niter][lowMode][minFold]
[normalize]
are set as recommended by SAINT. Further details on the parameter setting can be found in Choi et.al.(Current Protocols in Bioinformatics 2012).
The overall result is reported in the file WY_Result.csv:
It is based on the original Saint output ‘unique_interactions’, but additionally Westfall&Young adjusted p-values are assigned to each interaction candidate. These p-values control the FWER, allowing to estimate the portion of false-positive interactions.
Different .txt and .xls files are generated, enabling the user to follow the different intermediate results:
In case of normalization: normalized count data in form of the interaction table (txt file), named after the normalization method and the bait protein (e.g. quantile_bait_IntSaint.txt).
In case of filtering: the filtered (and normalized) interaction table (Inttable_filtered.txt).
The Saint output: ‘unique_interactions’, reporting the interaction candidates with SAINT scores, calculated on normalized data (file name ending _orig), and filtered: (file name ending _orgF).
Permutation data: scores calculated for each permutation data set (permutation matrix as perm.avgp.Rata, perm.maxp.Rdata).
SAINT is run as part of the workflow. It is important to note that the function saint_permF
requires a linux environment and was tested on SAINT version 2.3.4.
Martina Fischer
Choi H, Larsen B, Lin Z-Y, et al. SAINT: probabilistic scoring of affinity purification-mass spectrometry data. Nature Methods 2011.
Choi H, Liu G, Mellacheruvu D, et al. Analyzing Protein-Protein Interactions from Affinity Purification-Mass Spectrometry Data with SAINT. Current Protocols in Bioinformatics 2012.
Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biology 2010.
Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 2010.
Bolstad BM, Irizarry RA, Astrand M, et al. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003.
Westfall PH, Young SS. Resampling-based multiple testing: examples and methods for p-value adjustment. 1993.
Bourgon R, Gentleman R, Huber W. Independent filtering increases detection power for high-throughput experiments. Proceedings of the National Academy of Sciences 2010.
1 2 3 4 5 6 7 8 9 10 11 12 13 | #input dara
baitfile <- system.file("extdata", "baittab.txt", package="apmsWAPP")
intfile <- system.file("extdata", "inttable.txt", package="apmsWAPP")
protfile <- system.file("extdata", "prottable.txt", package="apmsWAPP")
# To run this example, a linux environment is required and SAINT needs
# to be installed!
# Important: Define a working directory for storage of the resulting
# files
# Pre-processing: quantile normalization and filtering
# Workflow call:
# saint_permF(baitfile,intfile,protfile, norm="quantile", Filter=TRUE,
# filter.method="overallVar", var.cutoff=0.3, intern.norm=FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.