Nothing
#' Optimal cut-off points for complex survey data
#'
#' Calculate optimal cut-off points for complex survey data (Iparragirre et al., 2022).
#' Some functions of the package OptimalCutpoints (Lopez-Raton et al, 2014) have been used and modified in order them to consider sampling weights.
#'
#' @param response.var A character string with the name of the column indicating the response variable in the data set
#' or a vector (either numeric or character string) with information of the response variable for all the units.
#' @param phat.var A character string with the name of the column indicating the estimated probabilities in the data set
#' or a numeric vector containing estimated probabilities for all the units.
#' @param weights.var A character string indicating the name of the column with sampling weights or
#' a numeric vector containing information of the sampling weights.
#' It could be \code{NULL} if the sampling design is indicated in the \code{design} argument.
#' For unweighted estimates, set all the sampling weight values to 1.
#' @param tag.event A character string indicating the label used to indicate the event of interest in \code{response.var}.
#' The default option is \code{tag.event = NULL}, which selects the class with the lowest number of units as event.
#' @param tag.nonevent A character string indicating the label used for non-event in \code{response.var}.
#' The default option is \code{tag.nonevent = NULL}, which selects the class with the greatest number of units as non-event.
#' @param method A character string indicating the method to be used to select the optimal cut-off point.
#' Choose one of the following methods (Lopez-Raton et al, 2014):
#' `MaxProdSpSe`, `ROC01`, `Youden`, `MaxEfficiency`.
#' @param data A data frame which, at least, must incorporate information on the columns
#' \code{response.var}, \code{phat.var} and \code{weights.var}.
#' If \code{data=NULL}, then specific numerical vectors must be included in
#' \code{response.var}, \code{phat.var} and \code{weights.var},
#' or the sampling design should be indicated in the argument \code{design}.
#' @param design An object of class \code{survey.design} generated by
#' \code{survey::svydesign} indicating the complex sampling design of the data.
#' If \code{design = NULL}, information on the data set (argument \code{data})
#' and/or sampling weights (argument \code{weights.var}) must be included.
#'
#' @details
#' Let \eqn{S} indicate a sample of \eqn{n} observations of the vector of random variables \eqn{(Y,\pmb X)}, and \eqn{\forall i=1,\ldots,n,} \eqn{y_i} indicate the \eqn{i^{th}} observation of the response variable \eqn{Y},
#' and \eqn{\pmb x_i} the observations of the vector covariates \eqn{\pmb X}. Let \eqn{w_i} indicate the sampling weight corresponding to the unit \eqn{i} and \eqn{\hat p_i} the estimated probability of event.
#' Let \eqn{S_0} and \eqn{S_1} be subsamples of \eqn{S}, formed by the units without the event of interest (\eqn{y_i=0}) and with the event of interest (\eqn{y_i=1}), respectively.
#' Then, the optimal cut-off points are obtained as follows:
#' - `Youden`: \deqn{c_w^{\text{Youden}}=argmax_c\{\widehat{Se}_w(c) + \widehat{Sp}_w(c)-1\},}
#' - `MaxProdSpSe`: \deqn{c_w^{\text{MaxProdSpSe}}=argmax_c\{\widehat{Se}_w(c) * \widehat{Sp}_w(c)\},}
#' - `ROC01`: \deqn{c_w^{\text{ROC01}}=argmax_c\{(\widehat{Se}_w(c)-1)^2 + (\widehat{Sp}_w(c)-1)^2\},}
#' - `MaxEfficiency`: \deqn{c_w^{\text{MaxEfficiency}}=argmax_c\{\hat p_{Y,w}\widehat{Se}_w(c) + (1-\hat p_{Y,w})\widehat{Sp}_w(c)\},}
#'
#' where, the sensitivity and specificity parameters for a given cut-off point \eqn{c} are estimated as follows:
#' \deqn{\widehat{Se}_w(c)=\dfrac{\sum_{i\in S_1}w_i\cdot I (\hat p_i\geq c)}{\sum_{i\in S_1}w_i}\:;\:\widehat{Sp}_w(c)=\dfrac{\sum_{i\in S_0}w_i\cdot I (\hat p_i<c)}{\sum_{i\in S_0}w_i},}
#' and,
#' \deqn{\hat p_{Y,w}=\dfrac{\sum_{i\in S} w_i\cdot I(y_i=1)}{\sum_{i\in S} w_i}.}
#'
#' See Iparragirre et al. (2022) and Lopez-Raton et al. (2014) for more information.
#'
#'
#' @references Iparragirre, A., Barrio, I., Aramendi, J. and Arostegui, I. (2022). Estimation of cut-off points under complex-sampling design data. \emph{SORT-Statistics and Operations Research Transactions} \bold{46}(1), 137--158.
#' @references Lopez-Raton, M., Rodriguez-Alvarez, M.X, Cadarso-Suarez, C. and Gude-Sampedro, F. (2014). OptimalCutpoints: An R Package for Selecting Optimal Cutpoints in Diagnostic Tests. \emph{Journal of Statistical Software} \bold{61}(8), 1--36.
#'
#' @returns The output of this function is an object of class `wocp`. This object is a list that contains information about the following 4 elements:
#' - `tags`: a list containing two elements with the following information:
#' - `tag.event`: a character string indicating the event of interest.
#' - `tag.nonevent`: a character string indicating the non-event.
#' - `basics`: a list containing information of the following 4 elements:
#' - `n.event`: number of units with the event of interest in the data set.
#' - `n.nonevent`: number of units without the event of interest in the data set.
#' - `hatN.event`: number of units with the event of interest represented in the population by all the event units in the data set, i.e., the sum of the sampling weights of the units with the event of interest in the data set.
#' - `hatN.nonevent`: a numeric value indicating the number of non-event units in the population represented by means of the non-event units in the data set, i.e., the sum of the sampling weights of the non-event units in the data set.
#' - `optimal.cutoff`: this object is a list of three elements containing the information described below:
#' - `method`: a character string indicating the method implemented to select the optimal cut-off point.
#' - `optimal`: a list containing information of the following four elements:
#' - `cutoff`: a numeric vector indicating the optimal cut-off point(s) that optimize(s) the selected criterion.
#' - `Sew`: a numeric vector indicating the estimated sensitivity parameter(s) corresponding to the optimal cut-off point(s) that optimize(s) the selected criterion.
#' - `Spw`: a numeric vector indicating the estimated specificity parameter(s) corresponding to the optimal cut-off point(s) that optimize(s) the selected criterion.
#' - `criterion`: a numeric value indicating the criterion value optimized by means of the selected optimal cut-off point(s).
#' - `all`: a list containing information on the following four elements:
#' - `cutoff`: a numeric vector indicating all the cut-off points considered.
#' - `Sew`: a numeric vector indicating the estimated sensitivity parameters corresponding to all the considered cut-off points.
#' - `Spw`: a numeric vector indicating the estimated sensitivity parameters corresponding to all the considered cut-off points.
#' - `criterion`: a numeric vector indicating the values of the selected criterion corresponding to all the considered cut-off points.
#' - `call`: an object saving the information about the way in which the function has been run.
#'
#' @export
#'
#' @examples
#' data(example_data_wroc)
#'
#' myocp <- wocp(response.var = "y", phat.var = "phat", weights.var = "weights",
#' tag.event = 1, tag.nonevent = 0,
#' method = "Youden",
#' data = example_data_wroc)
#'
#'# Or equivalently
#' myocp <- wocp(example_data_wroc$y, example_data_wroc$phat, example_data_wroc$weights,
#' tag.event = 1, tag.nonevent = 0, method = "Youden")
#'
#'
wocp <- function(response.var, phat.var, weights.var = NULL,
tag.event = NULL, tag.nonevent = NULL,
method = c("Youden", "MaxProdSpSe", "ROC01", "MaxEfficiency"),
data = NULL, design = NULL){
if(missing(method) || is.null(method)) {
stop("'method' argument required.")
}
if(any(!(method %in% c("MaxProdSpSe","ROC01","Youden","MaxEfficiency")))) {
stop ("You have entered an invalid method.")
}
if (missing(phat.var)|| is.null(phat.var)) {
stop("'phat.var' argument required.")
}
if (missing(response.var)|| is.null(response.var)) {
stop("'response.var' argument required.")
}
if(inherits(response.var, "character")){
response.var <- data[,response.var]
}
if(length(table(response.var))!=2){stop("Response variable must have two classes.")}
if(inherits(phat.var, "character")){
phat.var <- data[,phat.var]
}
if(inherits(weights.var, "character")){
weights.var <- data[,weights.var]
}
if(!is.null(design)){
data <- get(design$call$data)
weights <- as.character(design$call$weights[2])
weights.var <- data[,weights]
}
if(length(response.var) != length(phat.var) || length(response.var) != length(weights.var)){
stop("Vectors indicating the responses, predicted probabilities and sampling weights must be the same length.")
}
# Missing tags:
if(is.null(tag.nonevent)){
if(is.null(tag.event)){
tag.nonevent <- names(table(response.var)[which.max(table(response.var))])
tag.event <- names(table(response.var)[which.min(table(response.var))])
} else {
tags <- names(table(response.var))
tag.other <- which(tags == tag.event)
tag.nonevent <- tags[-tag.other]
}
} else {
if(is.null(tag.event)){
tags <- names(table(response.var))
tag.other <- which(tags == tag.nonevent)
tag.event <- tags[-tag.other]
}
}
res <- list()
res$optimal.cutoff <- list()
res$optimal.cutoff$method <- method
pop.prev <- sum(weights.var[which(response.var != tag.nonevent)])/sum(weights.var)
cutoffs <- sort(unique(phat.var))
sew.v <- sapply(lapply(cutoffs, wse, response.var = response.var, tag.event = tag.event, phat.var = phat.var, weights.var = weights.var), "[[", 1)
spw.v <- sapply(lapply(cutoffs, wsp, response.var = response.var, tag.nonevent = tag.nonevent, phat.var = phat.var, weights.var = weights.var), "[[", 1)
# Optimal cut-off point calculation:
criterion.v <- criterion.function(method = method, pop.prev = pop.prev, se.v = sew.v, sp.v = spw.v)
opt.ocp <- optimize.criterion(method = method, criterion.v = criterion.v, se.v = sew.v, sp.v = spw.v, cutoffs = cutoffs)
res <- list()
res$tags <- list(tag.event = as.character(tag.event),
tag.nonevent = as.character(tag.nonevent))
res$basics <- list(n.event = length(which(response.var==tag.event)),
n.nonevent = length(which(response.var==tag.nonevent)),
hatN.event = sum(weights.var[which(response.var==tag.event)]),
hatN.nonevent = sum(weights.var[which(response.var==tag.nonevent)]))
res$optimal.cutoff <- list()
res$optimal.cutoff$method <- method
res$optimal.cutoff$optimal <- opt.ocp$optimal
res$optimal.cutoff$all <- opt.ocp$all
res$call <- match.call()
class(res) <- "wocp"
return(res)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.