gee_screenr | R Documentation |
gee_screenr
is a convenience function which integrates GEE estimation
of logsitic models, k-fold cross-validation and estimation of the
receiver-operating characteristic. GEE estimation accommodates cluster sampling.
gee_screenr( formula, id = NULL, data = NULL, link = c("logit", "cloglog", "probit"), corstr = c("independence", "exchangeable", "unstructured"), Nfolds = 10, partial_auc = c(0.8, 1), partial_auc_focus = "sensitivity", partial_auc_correct = TRUE, boot_n = 4000, conf_level = 0.95, seed = Sys.time(), ... )
formula |
an object of class |
id |
a vector identifying the sampling clusters. |
data |
a dataframe containing the variables defined in |
link |
the character-valued name of the link function for logistic
regression. Choices are |
corstr |
a character string specifying the correlation structure. The
following are permitted: |
Nfolds |
number of folds used for k-fold cross validation (minimum = 2, maximum = 100). Default: 10. |
partial_auc |
either a logical |
partial_auc_focus |
one of |
partial_auc_correct |
logical value indicating whether the pAUC should be
transformed the interval from 0.5 to 1.0. |
boot_n |
Number of bootstrap replications for computation of confidence intervals for the (partial)AUC. Default: 4000. |
conf_level |
a number between 0 and 1 specifying the confidence level for confidence intervals for the (partial)AUC. Default: 0.95. |
seed |
random-number generator seed for cross-validation data splitting. |
... |
additional arguments passsed to or from other |
The results provide information from which to choose a probability threshold above which individual out-of-sample probabilies indicate the need to perform a diagnostic test. Out-of-sample performance is estimated using k-fold cross validation.
The receiver operating characteristics are computed using the pROC
package. See References and package documentation for additional details.
By default, the partial area under the ROC curve is computed from
that portion of the curve for which sensitivity is in the closed interval
[0.8, 1.0]. However, the total AUC can be obtained using the argument
partial_auc = FALSE
. Partial areas can be computed for either
ranges of sensitivity or specificity using the arguments
partial_auc_focus
and partial_auc
. By default, partial areas
are standardized.
Out-of-sample performance is estimated using k-fold cross-validation. For a gentle but python-centric introduction to k-fold cross-validation, see https://machinelearningmastery.com/k-fold-cross-validation/.
gee_screenr
returns an object of class gee_screenr
,
which inherits from class logreg_screenr
, containing the elements:
Call
The function call.
formula
The formula object.
Prevalence
Prevalence (proportion) of the test condition in the training sample.
ModelFit
An object of class glm
(See glm
)
containing the results of the model fit.
ISroc
An object of class roc
containing
the "in-sample" (overly-optimistic) receiver operating characteristics,
and additional functions for use with this object are available in the
pROC
package.
CVpreds
An object of class cv.predictions
containing
the data and cross-validated predicted condition y
.
CVroc
An object of class roc
containing
the k-fold cross-validated "out-of-sample" receiver operating
characteristics, and additional functions for use with this object are
available in the pROC
package.
CVcoef
the estimated coefficients from cross-validation
X_ho
the matrix of held-out predictors for each cross-validation fold
Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73(1):13-22. http://doi.org/10.2307/2336267
Halekoh U, Hojsgaard S, Yan, J. The R package geepack for generalized estimating equations. Journal of Statistical Software 2006;15(2):1-11. http://doi.org/10.18637/jss.v015.i02
Kim J-H. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis 2009:53(11):3735-3745. http://doi.org/10.1016/j.csda.2009.04.009
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C,
Muller M. pROC
: An open-source package for R
and S+ to
analyze and compare ROC curves. BMC Bioinformatics 2011;12(77):1-8.
http://doi.org/10.1186/1471-2105-12-77
geeglm
, roc
and
auc
## Not run: library(dplyr) data(unicorns) ## Add a contrived cluster identifier (25 clusters) for demonstration only: uniclus <- unicorns %>% mutate(cluster = sample(1:25, size = dim(unicorns)[1], replace = TRUE)) ## Use gee_screenr: uniobj3 <- gee_screenr(testresult ~ Q1 + Q2 + Q3 + Q5 + Q6 + Q7, id = cluster, data = uniclus, link = "logit", Nfolds = 10) class(uniobj3) methods(class = class(uniobj3)[1]) methods(class = class(uniobj3)[2]) summary(uniobj3) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.