tspm_apms: Workflow for AP-MS data analysis using TSPM

Description Usage Arguments Details Value Author(s) References Examples

Description

A complete workflow for the analysis of AP-MS data, using a two-stage-poisson model and a pre- and postprocessing framework.

Usage

1
2
3
4
5
6
7
tspm_apms(counts, baittab, 
   norm = c("none", "sumtotal", "upperquartile", 
            "DESeq", "TMM", "quantile"), 
   Filter = TRUE, 
   filter.method = c("IQR", "overallVar", "noVar"), 
   var.cutoff = NA, limit = 0, 
   adj.method = c("BH", "WY"))

Arguments

counts

matrix of spectral counts, proteins in rows and samples in columns.

baittab

a character string specifying the pathname of the baittable. see Details.

norm

method to normalize the data. If norm="none", no normalization of the data is performed.

Filter

logical value, whether filtering of the data is applied (Default TRUE).

filter.method

method to use for filtering, must be one of "IQR", "overallVar" or "noVar", only used when Filter=TRUE.

var.cutoff

percentile (between 0 and 1) or NA. Cutoff for filtering the data, defined by a quantile or shortest-interval (=NA, Default), only used when Filter=TRUE.

limit

minimal number of expected true interaction proteins in the data.

adj.method

method to adjust p-values for multiple testing.

Details

The baittable corresponds to a tab/space delimited file as required for SAINT - consisting of three columns: IP name, bait or control name, indicator for bait and control experiment (T=bait purification, C=control).

Pre-processing comprises normalization and filtering of the data:
Here, it can be chosen from five different normalization methods, adapted from microarray and RNA-seq analysis to AP-MS data. For further details see norm.inttable.
The filter consists of a biological filter and a statistical variance filter and aims to remove obvious contaminants from further analysis.
If filter.method="noVar", only the biological filter is conducted. Both are conducted, if filter.method="IQR", here the variance is calculated by the inter-quartile-range, or if filter.method="overallVar", here the variance is calculated across all samples.
The var.cutoff defines the fraction of proteins with the lowest overall variance, which are considered as contaminants and are removed. var.cutoff=NA refers to a cutoff defined by the mean of the shortest intervall containing 50% of the data (default). Alternatively, a quantile can be set as cutoff, e.g. a cutoff of 0.5 filters 50% of the data showing the smallest overall variance or IQR. see also varFilter
The parameter limit assures, that filtering results in a number of proteins above the number of expected true interaction proteins.

For postprocessing, two different adjustment procedures are provided for multiple testing: the Benjamini-Hochberg procedure ("BH") (p-values are controlled by FDR), and the permutation approach coupled to the Westfall&Young ("WY") algorithm (p-values are controlled by FWER).

Value

A list containing the following components:

id

name of the interaction protein

log.fold.change

a vector containing the estimated log fold changes for each protein

pvalues

a vector containing the raw p-values for each protein, evaluating the interaction

padj

a vector containing the p-values after adjusting for multiple testing using the method of Benjamini-Hochberg

LRT

a vector of Likelihood Ratio statistics, scoring the interaction potential of each protein

dispersion

a vector of yes/no indicating overdispersion for each protein

adjusted.p

a vector containing the adjusted p-values using the permutation-based approach of Westfall&Young

counter

a vector containing the number of exceeding permutation scores using the permutation-based approach of Westfall&Young

matrix1

(filtered) (normalized) matrix of spectral counts

matrix2

permutation matrix of scores, permutation runs in columns and proteins in rows

Author(s)

Martina Fischer

References

Fischer M, Zilkenat S, Gerlach R, Wagner S, Renard BY. Pre- and Postprocessing for Affinity Purification Mass Spectrometry Data: More Reliable Detection of Interaction Candidates. Journal of Proteome Research 2014.

Auer PL, Doerge RW. A two-stage Poisson model for testing RNA-Seq data. Statistical Applications in Genetics and Molecular Biology 2011.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# input data
intfile <- system.file("extdata", "inttable.txt", package="apmsWAPP")
counts <- int2mat(read.table(intfile))
baitfile <- system.file("extdata", "baittab.txt", package="apmsWAPP")
# TSPM with quantile normalization and filtering
tspm.quaF <- tspm_apms( counts, baitfile, 
                        norm="quantile", Filter=TRUE, 
                        filter.method="overallVar", 
                        var.cutoff=0.1, adj.method="WY")
# Results:
# for adjustment with BH:
cat("Number of Proteins with p-value <0.05: ",
length(which(tspm.quaF[[1]]$padj < 0.05) ) )
# for adjustment with WY:
cat("Number of Proteins with p-value <0.05: ",
length(which(tspm.quaF[[2]][,2] <0.05)))

apmsWAPP documentation built on May 2, 2019, 3:23 a.m.