corrected.wauc: Corrected estimate of the AUC based on replicate weights.

View source: R/corrected.wauc.R

corrected.waucR Documentation

Corrected estimate of the AUC based on replicate weights.

Description

Optimism correction of the AUC of logistic regression models with complex survey data based on replicate weights methods.

Usage

corrected.wauc(
  data = NULL,
  formula,
  tag.event = NULL,
  tag.nonevent = NULL,
  weights.var = NULL,
  strata.var = NULL,
  cluster.var = NULL,
  design = NULL,
  method = c("dCV", "JKn", "RB"),
  dCV.method = c("average", "pooling"),
  RB.method = c("subbootstrap", "bootstrap"),
  k = 10,
  R = 1,
  B = 200
)

Arguments

data

A data frame which, at least, must incorporate information on the columns response.var, phat.var and weights.var. If data=NULL, the sampling design must be indicated in the argument design.

formula

Formula of the model for which the AUC needs to be corrected. The models are fitted by means of survey::svyglm() function.

tag.event

A character string indicating the label used to indicate the event of interest in response.var. The default option is tag.event = NULL, which selects the class with the lowest number of units as event.

tag.nonevent

A character string indicating the label used for non-event in response.var. The default option is tag.nonevent = NULL, which selects the class with the greatest number of units as non-event.

weights.var

A character string indicating the name of the column with sampling weights. It could be NULL if the sampling design is indicated in the design argument.

strata.var

A character string indicating the name of the column with strata identifiers. It could be NULL if the sampling design is indicated in the design argument.

cluster.var

A character string indicating the name of the column with cluster identifiers. It could be NULL if the sampling design is indicated in the design argument or the sampling design does not have considered clustering.

design

An object of class survey.design generated by survey::svydesign(). It could be NULL if information about cluster.var, strata.var, weights.var and data are given.

method

A character string indicating the method to be applied to define replicate weights and correct the AUC. Choose between: JKn (for the Jackknife Repeated Replication), dCV (for the design-based cross-validation), RB (for the Rescaling Bootstrap).

dCV.method

Only applies for the dCV method. Choose between: average (for the averaging cross-validation) or pooling (for the pooling cross-validation). Note: pooling is recommended over average (see, Iparragirre and Barrio (2024))

RB.method

Only applies for the RB method. Choose between: subbootstrap or bootstrap (see the documentation of svyVarSel::replicate.weights() for help).

k

A numeric value indicating the number of folds to be defined. Default is k=10. Only applies for the dCV method.

R

A numeric value indicating the number of times the sample is partitioned. Default is R=1. Only applies for dCV, split or extrapolation methods.

B

A numeric value indicating the number of bootstrap resamples. Default is B=200. Only applies for bootstrap and subbootstrap methods.

Details

See Iparragirre and Barrio (2024) for more information on the AUC correction methods and their performance.

Value

The output object of this function is a list of 5 elements containing the following information:

  • corrected.AUCw: the corrected estimate of the weighted AUC.

  • correction.method: the selected correction method.

  • formula: formula of the model that has been fitted.

  • tags: a list containing two elements with the following information:

    • tag.event: a character string indicating the event of interest.

    • tag.nonevent: a character string indicating the non-event.

  • call: an object saving the information about the way in which the function has been run.

References

Iparragirre, A., Barrio, I. (2024). Optimism Correction of the AUC with Complex Survey Data. In: Einbeck, J., Maeng, H., Ogundimu, E., Perrakis, K. (eds) Developments in Statistical Modelling. IWSM 2024. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-031-65723-8_7

Examples

data(example_variables_wroc)
mydesign <- survey::svydesign(ids = ~cluster, strata = ~strata,
                              weights = ~weights, nest = TRUE,
                              data = example_variables_wroc)
m <- survey::svyglm(y ~ x1 + x2 + x3 + x4 + x5 + x6, design = mydesign,
                    family = quasibinomial())
phat <- predict(m, newdata = example_variables_wroc, type = "response")
myaucw <- wauc(response.var = example_variables_wroc$y, phat.var = phat,
               weights.var = example_variables_wroc$weights)

# Correction of the AUCw:
set.seed(1)
res <- corrected.wauc(data = example_variables_wroc,
                      formula = y ~ x1 + x2 + x3 + x4 + x5 + x6,
                      tag.event = 1, tag.nonevent = 0,
                      weights.var = "weights", strata.var = "strata", cluster.var = "cluster",
                      method = "dCV", dCV.method = "pooling", k = 10, R = 20)
# Or equivalently:

set.seed(1)
res <- corrected.wauc(design = mydesign,
                      formula = y ~ x1 + x2 + x3 + x4 + x5 + x6,
                      tag.event = 1, tag.nonevent = 0,
                      method = "dCV", dCV.method = "pooling", k = 10, R = 20)



svyROC documentation built on Oct. 25, 2024, 9:07 a.m.