iterateScampi: SCAMPI protein quantification function with iterative outlier...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/protiq.R

Description

Estimate a protein abundance score for each protein in the dataset, based on the input peptide abundance scores and the connectivity information between peptides and proteins. The expected values for the peptide abundances are computed as well. Comparing these values with the initial measurements allows to detect outliers in the input data. Several iterations of abundance estimation and outlier removal can then be performed.

Usage

1
2
3
iterateScampi(peptides, proteins, edgespp, rescaling = TRUE, 
              method = "LSE", numIter = 2, numMLEIter = 10, 
              thresh = 2, verbose = FALSE)

Arguments

peptides

Data frame with peptide information. The following columns are required: pepId (unique identification number for each distinct peptide sequence, numbering from 1:n where n=number of distinct peptide sequences), pepSeq (peptide sequence, optionally including modifications and charge states), and pepQty (peptide abundance score). An additional column pepObs (peptide observability or identification score) is used if provided. Each row in the data frame describes one observed distinct peptide sequence.

proteins

Data frame with the protein information. The following columns are required: protId (unique identification number for each distinct protein sequence, numbering from (n+1):(n+m) where m=number of distinct protein sequences), protName (protein identifier or protein sequence). Each row describes a distinct protein sequence to which at least one of the observed peptides is matching.

edgespp

Data frame with two mandatory columns: pepId and protId. Each row defines an edge of the bipartite graph.

rescaling

If TRUE, the peptide abundance scores are logarithmized (log10). If this transformation has not yet been done during preprocessing, it is strongly recommended to stick to the default: rescaling=TRUE.

method

Describes which method should be used for the parameter estimation. Available: method="LSE" (default), method="MLE" and method="all".

numIter

Number of estimation/outlier-removal iterations to be performed.

numMLEIter

Only used with method="MLE", see details. Default: numIter=10.

thresh

Constant to tune the outlier selection process. See details.

verbose

If TRUE, detailed output is provided.

Details

To use method="MLE" the inverses of the covariance matrices (of the connected components) are needed. Depending on the chosen parameters, this can lead to stability issues. To avoid the function from crashing, a try(...) bolck is used: the parameter estimation is performed until it was successful numIter times. Among these numIter sets, the one with the lowest negative log-likelihood value is returned.

Peptide outlier detection is based on an interquartile range criterion on the peptide abundance residuals. The larger the chosen thresh, the less peptides get discarded.

Value

Named list. Each element corresponds to one iteration step, and is a list itself with

scampiRes

object of class scampiVal

peptideOutliers

dataframe with the peptides selected as outliers and not used (removed from the graph) for this iteration step

Author(s)

Sarah Gerster sarah.gerster@isb-sib.ch

See Also

runScampi to perform a single iteration

Examples

1
2
3
4
5
data("leptoSRM")
scampiIterRes <- iterateScampi(peptides=leptoSRMpeptides, 
                               proteins=leptoSRMproteins, 
                               edgespp=leptoSRMedgespp, rescaling=FALSE,
                               method="LSE", numIter=3, thresh=1.37)

protiq documentation built on May 2, 2019, 9:06 a.m.