logreg_screenr: Fitting Screening Tools Using Ordinary Logistic Models
In sgutreuter/screenr: Construction of Binary Test-Screening Rules

logreg_screenr

R Documentation

Fitting Screening Tools Using Ordinary Logistic Models

Description

logreg_screenr is a convenience function which integrates ordinary logistic modeling, k-fold cross-validation and estimation of the receiver-operating characteristic.

Usage

logreg_screenr(
  formula,
  data = NULL,
  link = c("logit", "cloglog", "probit"),
  Nfolds = 10,
  partial_auc = c(0.8, 1),
  partial_auc_focus = "sensitivity",
  partial_auc_correct = TRUE,
  boot_n = 4000,
  conf_level = 0.95,
  seed = Sys.time(),
  ...
)

Arguments

`formula`	an object of class `stats::formula` defining the testing outcome and predictor covariates, which is passed to `stats::glm()`.
`data`	a dataframe containing the variables defined in `⁠formula⁠`. The testing outcome must be binary (0,1) indicating negative and positive test results, respectively, or logical (`⁠TRUE⁠`/`⁠FALSE⁠`). The covariates are typically binary (0 = no, 1 = yes) responses to questions which may be predictive of the test result, but any numeric or factor covariates can be used.
`link`	the character-valued name of the link function for logistic regression. Choices are `⁠"logit"⁠`, `⁠"cloglog"⁠` or `⁠"probit"⁠`. Default: `⁠"logit"⁠`.
`Nfolds`	number of folds used for k-fold cross validation (minimum = 2, maximum = 100). Default: 10.
`partial_auc`	either a logical `⁠FALSE⁠` or a numeric vector of the form `c(left, right)` where left and right are numbers in the interval [0, 1] specifying the endpoints for computation of the partial area under the ROC curve (pAUC). The total AUC is computed if `partial_auc` = `⁠FALSE⁠`. Default: `c(0.8, 1.0)`.
`partial_auc_focus`	one of `⁠"sensitivity"⁠` or `⁠specificity⁠`, specifying for which the pAUC should be computed. `partial_auc_focus` is ignored if `partial_auc` = `⁠FALSE⁠`. Default: `⁠"sensitivity"⁠`.
`partial_auc_correct`	logical value indicating whether the pAUC should be transformed the interval from 0.5 to 1.0. `partial_auc_correct` is ignored if `partial_auc` = `⁠FALSE⁠`. Default: `⁠TRUE⁠`).
`boot_n`	Number of bootstrap replications for computation of confidence intervals for the (partial)AUC. Default: 4000.
`conf_level`	a number between 0 and 1 specifying the confidence level for confidence intervals for the (partial)AUC. Default: 0.95.
`seed`	random-number generator seed for cross-validation data splitting.
`...`	additional arguments passsed to or from other `stats::glm` or `pROC::roc`.

Details

The results provide information from which to choose a probability threshold above which individual out-of-sample probabilies indicate the need to perform a diagnostic test. Out-of-sample performance is estimated using k-fold cross validation.

The receiver operating characteristics are computed using the pROC package. See References and package documentation for additional details.

By default, the partial area under the ROC curve is computed from that portion of the curve for which sensitivity is in the closed interval [0.8, 1.0]. However, the total AUC can be obtained using the argument partial_auc = FALSE. Partial areas can be computed for either ranges of sensitivity or specificity using the arguments partial_auc_focus and partial_auc. By default, partial areas are standardized.

Out-of-sample performance is estimated using k-fold cross-validation. For a gentle but python-centric introduction to k-fold cross-validation, see https://machinelearningmastery.com/k-fold-cross-validation/.

Value

logreg_screenr returns an object of class logreg_screenr containing the elements:

Call: The function call.
formula: The formula object.
Prevalence: Prevalence (proportion) of the test condition in the training sample.
ModelFit: An object of class ⁠glm⁠ (See glm) containing the results of the model fit.
ISroc: An object of class roc containing the "in-sample" (overly-optimistic) receiver operating characteristics, and additional functions for use with this object are available in the pROC package.
CVpreds: An object of class cv.predictions containing the data and cross-validated predicted condition y.
CVroc: An object of class roc containing the k-fold cross-validated "out-of-sample" receiver operating characteristics, and additional functions for use with this object are available in the pROC package.
CVcoef: the estimated coefficients from cross-validation
X_ho: the matrix of held-out predictors for each cross-validation fold

References

Kim J-H. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis. 2009:53(11):3735-3745. http://doi.org/10.1016/j.csda.2009.04.009

Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Muller M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12(77):1-8. http://doi.org/10.1186/1471-2105-12-77

Teferi W, Gutreuter S, Bekele A et al. Adapting strategies for effective and efficient pediatric HIV case finding: Risk screening tool for testing children presenting at high-risk entry points. BMC Infectious Diseases. 2022; 22:480. http://doi.org/10.1186/s12879-022-07460-w

Examples

## Not run: 
data(unicorns)
uniobj2 <- logreg_screenr(testresult ~ Q1 + Q2 + Q3 + Q5 + Q6 + Q7,
                           data = unicorns, link = "logit", Nfolds = 10)
methods(class = class(uniobj2))
summary(uniobj2)

## End(Not run)

sgutreuter/screenr documentation built on Oct. 19, 2024, 12:49 p.m.