R/corrected.wauc.R

Defines functions corrected.wauc

Documented in corrected.wauc

#' Corrected estimate of the AUC based on replicate weights.
#'
#' Optimism correction of the AUC of logistic regression models
#' with complex survey data based on replicate weights methods.
#' @param data A data frame which, at least, must incorporate information on the columns
#' \code{response.var}, \code{phat.var} and \code{weights.var}.
#' If \code{data=NULL}, the sampling design must be indicated in the argument \code{design}.
#' @param formula Formula of the model for which the AUC needs to be corrected.
#' The models are fitted by means of \code{survey::svyglm()} function.
#' @param tag.event A character string indicating the label used to indicate the event of interest in \code{response.var}.
#' The default option is \code{tag.event = NULL}, which selects the class with the lowest number of units as event.
#' @param tag.nonevent A character string indicating the label used for non-event in \code{response.var}.
#' The default option is \code{tag.nonevent = NULL}, which selects the class with the greatest number of units as non-event.
#' @param weights.var A character string indicating the name of the column with sampling weights.
#' It could be \code{NULL} if the sampling design is indicated in the \code{design} argument.
#' @param strata.var A character string indicating the name of the column with strata identifiers.
#' It could be \code{NULL} if the sampling design is indicated in the \code{design} argument.
#' @param cluster.var A character string indicating the name of the column with cluster identifiers.
#' It could be \code{NULL} if the sampling design is indicated in the \code{design} argument
#' or the sampling design does not have considered clustering.
#' @param design An object of class \code{survey.design} generated by \code{survey::svydesign()}.
#' It could be \code{NULL} if information about \code{cluster.var}, \code{strata.var}, \code{weights.var} and \code{data} are given.
#' @param method A character string indicating the method to be applied to define replicate weights and correct the AUC.
#' Choose between: \code{JKn} (for the Jackknife Repeated Replication), \code{dCV} (for the design-based cross-validation),
#' \code{RB} (for the Rescaling Bootstrap).
#' @param dCV.method Only applies for the \code{dCV} method.
#' Choose between: \code{average} (for the averaging cross-validation) or \code{pooling} (for the pooling cross-validation).
#' Note: \code{pooling} is recommended over \code{average} (see, Iparragirre and Barrio (2024))
#' @param RB.method Only applies for the \code{RB} method.
#' Choose between: \code{subbootstrap} or \code{bootstrap} (see the documentation of \code{svyVarSel::replicate.weights()} for help).
#' @param k A numeric value indicating the number of folds to be defined.
#' Default is \code{k=10}. Only applies for the \code{dCV} method.
#' @param R A numeric value indicating the number of times the sample is partitioned. Default is \code{R=1}. Only applies for \code{dCV}, \code{split} or \code{extrapolation} methods.
#' @param B A numeric value indicating the number of bootstrap resamples. Default is \code{B=200}. Only applies for \code{bootstrap} and  \code{subbootstrap} methods.
#'
#' @details
#' See Iparragirre and Barrio (2024) for more information on the AUC correction methods and their performance.
#'
#'
#' @references Iparragirre, A., Barrio, I. (2024).
#' Optimism Correction of the AUC with Complex Survey Data.
#' In: Einbeck, J., Maeng, H., Ogundimu, E., Perrakis, K. (eds)
#' Developments in Statistical Modelling. IWSM 2024. Contributions to Statistics.
#' Springer, Cham. https://doi.org/10.1007/978-3-031-65723-8_7
#'
#' @return The output object of this function is a list of 5 elements containing the following information:
#' - `corrected.AUCw`: the corrected estimate of the weighted AUC.
#' - `correction.method`: the selected correction method.
#' - `formula`: formula of the model that has been fitted.
#' - `tags`: a list containing two elements with the following information:
#'    - `tag.event`: a character string indicating the event of interest.
#'    - `tag.nonevent`: a character string indicating the non-event.
#' - `call`: an object saving the information about the way in which the function has been run.
#'
#' @import survey
#' @import svyVarSel
#' @importFrom stats predict
#' @importFrom stats quasibinomial
#'
#' @export
#'
#' @examples
#' data(example_variables_wroc)
#' mydesign <- survey::svydesign(ids = ~cluster, strata = ~strata,
#'                               weights = ~weights, nest = TRUE,
#'                               data = example_variables_wroc)
#' m <- survey::svyglm(y ~ x1 + x2 + x3 + x4 + x5 + x6, design = mydesign,
#'                     family = quasibinomial())
#' phat <- predict(m, newdata = example_variables_wroc, type = "response")
#' myaucw <- wauc(response.var = example_variables_wroc$y, phat.var = phat,
#'                weights.var = example_variables_wroc$weights)
#'
#' # Correction of the AUCw:
#' set.seed(1)
#' res <- corrected.wauc(data = example_variables_wroc,
#'                       formula = y ~ x1 + x2 + x3 + x4 + x5 + x6,
#'                       tag.event = 1, tag.nonevent = 0,
#'                       weights.var = "weights", strata.var = "strata", cluster.var = "cluster",
#'                       method = "dCV", dCV.method = "pooling", k = 10, R = 20)
#' # Or equivalently:
#' \donttest{
#' set.seed(1)
#' res <- corrected.wauc(design = mydesign,
#'                       formula = y ~ x1 + x2 + x3 + x4 + x5 + x6,
#'                       tag.event = 1, tag.nonevent = 0,
#'                       method = "dCV", dCV.method = "pooling", k = 10, R = 20)
#' }
#'
corrected.wauc <- function(data = NULL, formula, tag.event = NULL, tag.nonevent = NULL,
                           weights.var = NULL, strata.var = NULL, cluster.var = NULL, design = NULL,
                           method = c("dCV", "JKn", "RB"),
                           dCV.method = c("average", "pooling"), RB.method = c("subbootstrap", "bootstrap"),
                           k = 10, R = 1, B = 200){

  # Identify the response variable:
  response.var <- as.character(formula[2])

  if(!is.null(design)){
    data <- get(design$call$data)
    weights.var <- as.character(design$call$weights[2])
    strata.var <- as.character(design$call$strata[2])
    cluster.var <- as.character(design$call$ids[2])
  }

  if(length(table(data[,response.var]))!=2){stop("Response variable must have two classes.")}


  # Missing tags:
  if(is.null(tag.nonevent)){
    if(is.null(tag.event)){
      tag.nonevent <- names(table(data[,response.var])[which.max(table(data[,response.var]))])
      tag.event <- names(table(data[,response.var])[which.min(table(data[,response.var]))])
    } else {
      tags <- names(table(data[,response.var]))
      tag.other <- which(tags == tag.event)
      tag.nonevent <- tags[-tag.other]
    }
  } else {
    if(is.null(tag.event)){
      tags <- names(table(data[,response.var]))
      tag.other <- which(tags == tag.nonevent)
      tag.event <- tags[-tag.other]
    }
  }


  # Methods:
  if(length(method)!=1){
    stop("Please, select one (and only one) correction method: 'dCV', 'JKn' or 'RB'. See the documentation of the package for help.")
  }

  if(method == "dCV"){
    if(length(dCV.method) != 1){
      stop("Please, select one (and only one) correction method: 'average' or 'pooling'. See the documentation of the package for help.")
    } else {
      if(dCV.method != "average" & dCV.method != "pooling"){
        stop("Please, select one (and only one) correction method: 'average' or 'pooling'. See the documentation of the package for help.")
      } else {
        auc <- cor.dCV(data = data,
                       response.var = response.var,
                       weights.var = weights.var,
                       strata.var = strata.var,
                       cluster.var = cluster.var,
                       k = k, R = R,
                       method = dCV.method,
                       formula = formula,
                       tag.event = tag.event,
                       tag.nonevent = tag.nonevent)
      }
    }
  }

  if(method == "JKn"){
    auc <- cor.JKn(data = data,
                   response.var = response.var,
                   weights.var = weights.var,
                   strata.var = strata.var,
                   cluster.var = cluster.var,
                   formula = formula,
                   tag.event = tag.event,
                   tag.nonevent = tag.nonevent)
  }

  if(method == "RB"){
    if(length(RB.method) != 1){
      stop("Please, select one (and only one) correction method: 'subbootstrap' or 'bootstrap'. See the documentation of the package for help.")
    } else {
      if(RB.method != "subbootstrap" & RB.method != "bootstrap"){
        stop("Please, select one (and only one) correction method: 'subbootstrap' or 'bootstrap'. See the documentation of the package for help.")
      } else {
        auc <- cor.RB(data = data,
                      response.var = response.var,
                      weights.var = weights.var,
                      strata.var = strata.var,
                      cluster.var = cluster.var,
                      B = B,
                      formula = formula,
                      method = RB.method,
                      tag.event = tag.event,
                      tag.nonevent = tag.nonevent)
     }
    }
  }

  auc.obj <- list()
  auc.obj$corrected.AUCw <- auc
  auc.obj$correction.method <- list()
  auc.obj$correction.method$method <- method
  if(method == "dCV"){
    auc.obj$correction.method$dCV.method <- dCV.method
    auc.obj$correction.method$k <- k
    auc.obj$correction.method$R <- R
  }
  if(method == "RB"){
    auc.obj$correction.method$RB.method <- RB.method
    auc.obj$correction.method$B <- B
  }
  auc.obj$formula <- formula
  auc.obj$tags <- list(tag.event = as.character(tag.event),
                       tag.nonevent = as.character(tag.nonevent))
  auc.obj$call <- match.call()


  return(auc.obj)


}

Try the svyROC package in your browser

Any scripts or data that you put into this service are public.

svyROC documentation built on Oct. 25, 2024, 9:07 a.m.