gee_screenr: Fitting Screening Tools Using GEE Estimation of Logistic...

View source: R/gee_screenr.R

gee_screenrR Documentation

Fitting Screening Tools Using GEE Estimation of Logistic Models

Description

gee_screenr is a convenience function which integrates GEE estimation of logsitic models, k-fold cross-validation and estimation of the receiver-operating characteristic. GEE estimation accommodates cluster sampling.

Usage

gee_screenr(
  formula,
  id = NULL,
  data = NULL,
  link = c("logit", "cloglog", "probit"),
  corstr = c("independence", "exchangeable", "unstructured"),
  Nfolds = 10,
  partial_auc = c(0.8, 1),
  partial_auc_focus = "sensitivity",
  partial_auc_correct = TRUE,
  boot_n = 4000,
  conf_level = 0.95,
  seed = Sys.time(),
  ...
)

Arguments

formula

an object of class stats::formula defining the testing outcome and predictor covariates, which is passed to stats::glm().

id

a vector identifying the sampling clusters.

data

a dataframe containing the variables defined in formula. The testing outcome must be binary (0,1) indicating negative and positive test results, respectively, or logical (TRUE/FALSE). The covariates are typically binary (0 = no, 1 = yes) responses to questions which may be predictive of the test result, but any numeric or factor covariates can be used.

link

the character-valued name of the link function for logistic regression. Choices are "logit", "cloglog" or "probit". Default: "logit".

corstr

a character string specifying the correlation structure. The following are permitted: "independence", "exchangeable" and "unstructured". Default: independence

Nfolds

number of folds used for k-fold cross validation (minimum = 2, maximum = 100). Default: 10.

partial_auc

either a logical FALSE or a numeric vector of the form c(left, right) where left and right are numbers in the interval [0, 1] specifying the endpoints for computation of the partial area under the ROC curve (pAUC). The total AUC is computed if partial_auc = FALSE. Default: c(0.8, 1.0).

partial_auc_focus

one of "sensitivity" or specificity, specifying for which the pAUC should be computed. partial_auc_focus is ignored if partial_auc = FALSE. Default: "sensitivity".

partial_auc_correct

logical value indicating whether the pAUC should be transformed the interval from 0.5 to 1.0. partial_auc_correct is ignored if partial_auc = FALSE. Default: TRUE).

boot_n

Number of bootstrap replications for computation of confidence intervals for the (partial)AUC. Default: 4000.

conf_level

a number between 0 and 1 specifying the confidence level for confidence intervals for the (partial)AUC. Default: 0.95.

seed

random-number generator seed for cross-validation data splitting.

...

additional arguments passsed to or from other geepack::geeglm or pROC::roc.

Details

The results provide information from which to choose a probability threshold above which individual out-of-sample probabilies indicate the need to perform a diagnostic test. Out-of-sample performance is estimated using k-fold cross validation.

The receiver operating characteristics are computed using the pROC package. See References and package documentation for additional details.

By default, the partial area under the ROC curve is computed from that portion of the curve for which sensitivity is in the closed interval [0.8, 1.0]. However, the total AUC can be obtained using the argument partial_auc = FALSE. Partial areas can be computed for either ranges of sensitivity or specificity using the arguments partial_auc_focus and partial_auc. By default, partial areas are standardized.

Out-of-sample performance is estimated using k-fold cross-validation. For a gentle but python-centric introduction to k-fold cross-validation, see https://machinelearningmastery.com/k-fold-cross-validation/.

Value

gee_screenr returns an object of class gee_screenr, which inherits from class logreg_screenr, containing the elements:

Call

The function call.

formula

The formula object.

Prevalence

Prevalence (proportion) of the test condition in the training sample.

ModelFit

An object of class glm (See glm) containing the results of the model fit.

ISroc

An object of class roc containing the "in-sample" (overly-optimistic) receiver operating characteristics, and additional functions for use with this object are available in the pROC package.

CVpreds

An object of class cv.predictions containing the data and cross-validated predicted condition y.

CVroc

An object of class roc containing the k-fold cross-validated "out-of-sample" receiver operating characteristics, and additional functions for use with this object are available in the pROC package.

CVcoef

the estimated coefficients from cross-validation

X_ho

the matrix of held-out predictors for each cross-validation fold

References

Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73(1):13-22. http://doi.org/10.2307/2336267

Halekoh U, Hojsgaard S, Yan, J. The R package geepack for generalized estimating equations. Journal of Statistical Software 2006;15(2):1-11. http://doi.org/10.18637/jss.v015.i02

Kim J-H. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis 2009:53(11):3735-3745. http://doi.org/10.1016/j.csda.2009.04.009

Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Muller M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011;12(77):1-8. http://doi.org/10.1186/1471-2105-12-77

See Also

geeglm, roc and auc

Examples

## Not run: 
library(dplyr)
data(unicorns)
## Add a contrived cluster identifier (25 clusters) for demonstration only:
uniclus <- unicorns %>%
   mutate(cluster = sample(1:25, size = dim(unicorns)[1], replace = TRUE))
## Use gee_screenr:
uniobj3 <- gee_screenr(testresult ~ Q1 + Q2 + Q3 + Q5 + Q6 + Q7, id = cluster,
                       data = uniclus, link = "logit", Nfolds = 10)
class(uniobj3)
methods(class = class(uniobj3)[1])
methods(class = class(uniobj3)[2])
summary(uniobj3)

## End(Not run)

sgutreuter/screenr documentation built on Nov. 20, 2022, 2:41 a.m.