gee_screenr: Fitting Screening Tools Using GEE Estimation of Logistic...
In sgutreuter/screenr: Construction of Binary Test-Screening Rules

gee_screenr

R Documentation

Fitting Screening Tools Using GEE Estimation of Logistic Models

Description

gee_screenr is a convenience function which integrates GEE estimation of logsitic models, k-fold cross-validation and estimation of the receiver-operating characteristic. GEE estimation accommodates cluster sampling.

Usage

gee_screenr(
  formula,
  id = NULL,
  data = NULL,
  link = c("logit", "cloglog", "probit"),
  corstr = c("independence", "exchangeable", "unstructured"),
  Nfolds = 10,
  partial_auc = c(0.8, 1),
  partial_auc_focus = "sensitivity",
  partial_auc_correct = TRUE,
  boot_n = 4000,
  conf_level = 0.95,
  seed = Sys.time(),
  ...
)

Arguments

`formula`	an object of class `stats::formula` defining the testing outcome and predictor covariates, which is passed to `stats::glm()`.
`id`	a vector identifying the sampling clusters.
`data`	a dataframe containing the variables defined in `formula`. The testing outcome must be binary (0,1) indicating negative and positive test results, respectively, or logical (`TRUE`/`FALSE`). The covariates are typically binary (0 = no, 1 = yes) responses to questions which may be predictive of the test result, but any numeric or factor covariates can be used.
`link`	the character-valued name of the link function for logistic regression. Choices are `"logit"`, `"cloglog"` or `"probit"`. Default: `"logit"`.
`corstr`	a character string specifying the correlation structure. The following are permitted: `"independence"`, `"exchangeable"` and `"unstructured"`. Default: `independence`
`Nfolds`	number of folds used for k-fold cross validation (minimum = 2, maximum = 100). Default: 10.
`partial_auc`	either a logical `FALSE` or a numeric vector of the form `c(left, right)` where left and right are numbers in the interval [0, 1] specifying the endpoints for computation of the partial area under the ROC curve (pAUC). The total AUC is computed if `partial_auc` = `FALSE`. Default: `c(0.8, 1.0)`.
`partial_auc_focus`	one of `"sensitivity"` or `specificity`, specifying for which the pAUC should be computed. `partial_auc_focus` is ignored if `partial_auc` = `FALSE`. Default: `"sensitivity"`.
`partial_auc_correct`	logical value indicating whether the pAUC should be transformed the interval from 0.5 to 1.0. `partial_auc_correct` is ignored if `partial_auc` = `FALSE`. Default: `TRUE`).
`boot_n`	Number of bootstrap replications for computation of confidence intervals for the (partial)AUC. Default: 4000.
`conf_level`	a number between 0 and 1 specifying the confidence level for confidence intervals for the (partial)AUC. Default: 0.95.
`seed`	random-number generator seed for cross-validation data splitting.
`...`	additional arguments passsed to or from other `geepack::geeglm` or `pROC::roc`.

Details

The results provide information from which to choose a probability threshold above which individual out-of-sample probabilies indicate the need to perform a diagnostic test. Out-of-sample performance is estimated using k-fold cross validation.

The receiver operating characteristics are computed using the pROC package. See References and package documentation for additional details.

By default, the partial area under the ROC curve is computed from that portion of the curve for which sensitivity is in the closed interval [0.8, 1.0]. However, the total AUC can be obtained using the argument partial_auc = FALSE. Partial areas can be computed for either ranges of sensitivity or specificity using the arguments partial_auc_focus and partial_auc. By default, partial areas are standardized.

Out-of-sample performance is estimated using k-fold cross-validation. For a gentle but python-centric introduction to k-fold cross-validation, see https://machinelearningmastery.com/k-fold-cross-validation/.

Value

gee_screenr returns an object of class gee_screenr, which inherits from class logreg_screenr, containing the elements:

Call: The function call.
formula: The formula object.
Prevalence: Prevalence (proportion) of the test condition in the training sample.
ModelFit: An object of class glm (See glm) containing the results of the model fit.
ISroc: An object of class roc containing the "in-sample" (overly-optimistic) receiver operating characteristics, and additional functions for use with this object are available in the pROC package.
CVpreds: An object of class cv.predictions containing the data and cross-validated predicted condition y.
CVroc: An object of class roc containing the k-fold cross-validated "out-of-sample" receiver operating characteristics, and additional functions for use with this object are available in the pROC package.
CVcoef: the estimated coefficients from cross-validation
X_ho: the matrix of held-out predictors for each cross-validation fold

References

Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73(1):13-22. http://doi.org/10.2307/2336267

Halekoh U, Hojsgaard S, Yan, J. The R package geepack for generalized estimating equations. Journal of Statistical Software 2006;15(2):1-11. http://doi.org/10.18637/jss.v015.i02

Kim J-H. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis 2009:53(11):3735-3745. http://doi.org/10.1016/j.csda.2009.04.009

Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Muller M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011;12(77):1-8. http://doi.org/10.1186/1471-2105-12-77

Examples

## Not run: 
library(dplyr)
data(unicorns)
## Add a contrived cluster identifier (25 clusters) for demonstration only:
uniclus <- unicorns %>%
   mutate(cluster = sample(1:25, size = dim(unicorns)[1], replace = TRUE))
## Use gee_screenr:
uniobj3 <- gee_screenr(testresult ~ Q1 + Q2 + Q3 + Q5 + Q6 + Q7, id = cluster,
                       data = uniclus, link = "logit", Nfolds = 10)
class(uniobj3)
methods(class = class(uniobj3)[1])
methods(class = class(uniobj3)[2])
summary(uniobj3)

## End(Not run)

sgutreuter/screenr documentation built on Nov. 20, 2022, 2:41 a.m.