#' @rdname findCutpoint
#' @title Find a cutoff of divergences of methylation level values
#' @description A function to help on the decision of which is the best cutoff
#' value for DIMP/DMP predictions. The genome-wide methylation changes that
#' occurs in any living organism is the result of the superposition of
#' several stochastic processes: the inherent stochasticity of biological
#' processes and, particular, ultimately, it derives from the stochasticity
#' of biochemical processes. On this scenario, there is not way to say with
#' absolute determinism where a given value of an information divergence is
#' a true positive value or a true negative value. All what we can do is
#' the estimation of performance indicators like accuracy, sensitivity,
#' false positive rate, etc., to evaluate the consequences of our decision
#' on what we consider a true positive or a true negative. For example, a
#' difference of methylation levels of 100% observed in a treatment group of
#' samples in given cytosine position does not means that this difference
#' will not be observed in some sample from the control group. Without any
#' doubt about it, such a different can be found in control samples as well.
#' The fluctuation theorem guaranty such an outcome, which in the current
#' context is a consequence of the action of second law of thermodynamics on
#' living organisms.
#' @details Given a numerical vector of cutoff values for the divergences of
#' methylation level values, or p-values cutoffs, this function search for
#' the cutoff value that yield the best classification performance for the
#' specified performance indicator.
#' @param LR A list of GRanges, a GRangesList, a CompressedGRangesList object.
#' Each GRanges object from the list must have at least two columns: a
#' column containing the total variation of methylation level (TV,
#' difference of methylation levels) and a column containing a divergence of
#' methylation levels (it could be TV or Hellinger divergence) or a
#' column with a p-value from where the cutpoint will be found
#' (see example).
#'
#' @param min.tv Minimum value for the total variation distance (TVD; absolute
#' value of methylation levels differences, \eqn{TVD = abs(TV)}). Only
#' sites/ranges k with \eqn{TVD_{k} > min.tv} are analyzed. Defaul
#' min.tv = 0.25.
#' @param tv.cut A cutoff for the total variation distance to be applied to each
#' site/range. Sites/ranges k with \eqn{TVD_{k} < tv.cut} are considered
#' TRUE negatives and sites with \eqn{TVD_{k} > tv.cut} TRUE positives.
#' Its value must be a number \eqn{0 < tv.cut < 1}. A possible value
#' for tv.cut would be, e.g., the minimum value of *TV* found in the
#' treatment group after the potential DMPs are estimated. Default is
#' \eqn{tv.cut = 0.5}.
#' @param predcuts A numerical vector of possible cutoff values (cutpoints) for
#' a divergence of methylation levels value or a p-value, according with the
#' magnitude given in div.col or in pval.col, respectively. For each
#' cutpoint k the values greater than predcuts[k] are predicted TRUE
#' (positives), otherwise are predicted FALSE (negatives).
#' @param tv.col Column number where the total variation is located in the
#' metadata from each GRanges object.
#' @param div.col Column number for divergence of methylation levels used in the
#' estimation of the cutpoints. Default: NULL. One of the parameter values
#' div.col or pval.col must be given.
#' @param pval.col Column number for p-value used in the estimation of the
#' cutpoints. Default: NULL. One of the parameter values div.col or pval.col
#' must be given.
#' @param stat An integer number indicating the statistic to be used in the
#' testing. The mapping for statistic names are:
#' 0 = "All" 1 = "Accuracy", 2 = "Sensitivity", 3 = "Specificity",
#' 4 = "Pos Pred Value", 5 = "Neg Pred Value", 6 = "Precision",
#' 7 = "Recall", 8 = "F1", 9 = "Prevalence", 10 = "Detection Rate",
#' 11 = "Detection Prevalence", 12 = "Balanced Accuracy".
#' @param maximize Whether to maximize the performance indicator given in
#' parameter 'stat'. Default: TRUE.
#' @param num.cores,tasks Paramaters for parallele computation using package
#' \code{\link[BiocParallel]{BiocParallel-package}}: the number of cores to
#' use, i.e. at most how many child processes will be run simultaneously
#' (see \code{\link[BiocParallel]{bplapply}} and the number of tasks per job
#' (only for Linux OS).
#'
#' @importFrom BiocParallel MulticoreParam bplapply SnowParam
#' @importFrom S4Vectors mcols
#' @importFrom caret confusionMatrix
#' @return A list with the classification repformance results for the best
#' cutoff value in the ranges of predcuts supplied.
#' @export
#' @author Robersy Sanchez
#' @examples
#' # load simulated data of potential methylated signal
#' data(sim_ps)
#'
#' # Vector of cutoff values
#' cuts = c(2, 5, 10, 15, 18, 20, 21, 22, 25, 27, 30, 35, 40,
#' 45, 50, 55, 60)
#
#' # === To find the cutpoint that maximize the accuracy ===
#' pre.cut.acc = findCutpoint(LR = PS, min.tv = 0.25, tv.cut = 0.5,
#' predcuts = cuts, tv.col = 7L, div.col = 9,
#' stat = 1, num.cores = 15)
findCutpoint <- function(LR, min.tv=0.25, tv.cut=0.5, predcuts, tv.col,
div.col=NULL, pval.col=NULL, stat = 1, maximize = TRUE,
num.cores = 1L, tasks=tasks) {
# statistic names
# 0 = "All" 1 = "Accuracy", 2 = "Sensitivity", 3 = "Specificity",
# 4 = "Pos Pred Value", 5 = "Neg Pred Value", 6 = "Precision", 7 = "Recall",
# 8 = "F1", 9 = "Prevalence", 10 = "Detection Rate",
# 11 = "Detection Prevalence", 12 = "Balanced Accuracy"
if (Sys.info()['sysname'] == "Linux") {
bpparam <- MulticoreParam(workers=num.cores, tasks=tasks)
} else {
bpparam <- SnowParam(workers = num.cores, type = "SOCK")
}
res <- bplapply(predcuts, function(k, LR)
classPerform(LR=LR, min.tv=min.tv, tv.cut=tv.cut, cutoff=k,
tv.col=tv.col, div.col=div.col, pval.col=pval.col,
stat=stat), LR, BPPARAM = bpparam)
res <- unlist(res)
if (!all(is.na(res))) {
if (maximize) cutp = predcuts[which.max(res)]
else cutp = predcuts[which.min(res)]
perf <- classPerform(LR=LR, min.tv=min.tv, tv.cut=tv.cut, cutoff=cutp,
tv.col=tv.col, div.col=div.col, pval.col=pval.col,
stat=0)
res = list(optimCutpoint=cutp, Statistic=res, Performance=perf[[1]],
FDR=perf[[2]])
} else res = list(optimCutpoint=NA, Statistic=NA, Performance=NA,
FDR=NA)
return(res)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.