R/mscore4protfdr.R
In SWATH2stats: Transform and Filter SWATH Data for Statistical Packages

Documented in mscore4protfdr

#' Find m_score cutoff to reach a desired FDR on protein level (over the entire
#' OpenSWATH/pyProphet output table)
#'
#' This function estimates the m_score cutoff required in a dataset to reach a
#' given overall protein level FDR.
#' This filter is to be used with caution as the resulting quantitative matrix
#' is relatively sparse. It can be filled with quantitative values at a lower
#' FDR quality level.
#' It counts target and decoy peptides (unique ProteinName) at high resolution
#' across the m_score cutoffs and reports a useful m_score cutoff - peptide FDR
#' pair close to the supplied fdr_target level over the entire dataset. The
#' m_score cutoff is returned by the function and can be used in the context of
#' the filtering functions, e.g.:
#' data.protFDR5pc<-filter_mscore(data, mscore4protfdr(data, fdr_target=0.02))
#' To arrive from decoy counts at an estimation of the false discovery rate
#' (false positives among the targets remaining at a given mscore cutoff) the
#' ratio of false positives to true negatives (decoys) (FFT) must be
#' supplied. It is estimated for each run individually by pyProphet and
#' contained in the pyProphet statistics [Injection_name]_full_stat.csv. As an
#' approximation, the FFTs of multiple runs are averaged and supplied as
#' argument FFT. For further details see the Vignette Section 1.3 and 4.1.
#' For FDR evaluations on assay and peptide level, please refer to functions
#' mscore4assayfdr and mscore4pepfdr.
#'
#' @param data Annotated OpenSWATH/pyProphet data table. See function
#'   sample_annotation from this package.
#' @param FFT Ratio of false positives to true negatives, q-values from
#'   [Injection_name]_full_stat.csv in pyProphet stats output. As an
#'   approximation, the q-values of multiple runs are averaged and supplied as
#'   argument FFT. Numeric from 0 to 1. Defaults to 1, the most conservative
#'   value (1 Decoy indicates 1 False target).
#' @param fdr_target  FDR target, numeric, defaults to 0.01. An m_score cutoff
#'   achieving an FDR < fdr_target will be selected.
#'   Calculated as FDR = (TN*FFT/T); TN=decoys, T=targets, FFT=see above.
#' @param mscore.col Column containing the mscore data.
#' @return  Returns the m_score cutoff selected to arrive at the desired FDR
#'   quality.
#' @author Moritz Heusel
#' @examples
#'  data("OpenSWATH_data", package="SWATH2stats")
#'  data("Study_design", package="SWATH2stats")
#'  data <- sample_annotation(OpenSWATH_data, Study_design)
#'  chosen <- mscore4protfdr(data, FFT=0.7, fdr_target=0.01)
#' @export
mscore4protfdr <- function(data, FFT = 1, 
                           fdr_target = 0.02, 
                           mscore.col = "m_score") {

    mscore.col <- JPP_update(data, mscore.col)

    # generate high resolution mscore levels to assess mscore cutoff for a given
    # fdr_target
    mscore_levels_highres = 10^-(c(seq(2, 20, 0.05)))
    target.protein.highres <- NULL
    decoy.protein.highres <- NULL
    for (i in seq_len(length(mscore_levels_highres))) {
        target.protein.highres[i] <- length(unique(data[data$decoy == FALSE & data[,
            mscore.col] <= mscore_levels_highres[i], c("ProteinName")]))
        decoy.protein.highres[i] <- length(unique(data[data$decoy == TRUE & data[,
            mscore.col] <= mscore_levels_highres[i], c("ProteinName")]))
    }
    protein.fdr.highres <- (decoy.protein.highres/target.protein.highres) * FFT

    # pick mscore cutoff closest to (<=) fdr_target % protein FDR & filter data
    # accordingly
    mscore_chosen <- mscore_levels_highres[protein.fdr.highres <= fdr_target][1]
    protein_fdr_chosen <- protein.fdr.highres[protein.fdr.highres <= fdr_target][1]
    message("Target protein FDR:", fdr_target, "\n")
    message("Required overall m-score cutoff:", signif(mscore_chosen, digits = 5),
        "\n", "achieving protein FDR =", signif(protein_fdr_chosen, digits = 3),
        "\n")
    return(mscore_chosen)
}