View source: R/lasso_screenr.R
lasso_screenr | R Documentation |
lasso_screenr
is a convenience function which combines
logistic regression using L1 regularization, k-fold
cross-validation, and estimation of the receiver-operating characteristic (ROC).
The in-sample and out-of-sample performance is estimated from the models
which produced the minimum AIC and minimum BIC. Execute
methods(class = "lasso_screenr")
to identify available methods.
lasso_screenr( formula, data = NULL, Nfolds = 10, L2 = TRUE, partial_auc = c(0.8, 1), partial_auc_focus = "sensitivity", partial_auc_correct = TRUE, boot_n = 4000, conf_level = 0.95, standardize = FALSE, seed = Sys.time(), ... )
formula |
an object of class |
data |
a dataframe containing the variables defined in |
Nfolds |
the number of folds used for k-fold cross validation. Default = 10; minimum = 2, maximum = 100. |
L2 |
(logical) switch controlling penalization using the L2 norm of
the parameters. Default: |
partial_auc |
either a logical |
partial_auc_focus |
one of |
partial_auc_correct |
logical value indicating whether the pAUC should be
transformed the interval from 0.5 to 1.0. |
boot_n |
number of bootstrap replications for computation of confidence intervals for the (partial)AUC. Default: 4000. |
conf_level |
a number between 0 and 1 specifying the confidence level for confidence intervals for the (partial)AUC. Default: 0.95. |
standardize |
logical; if TRUE predictors are standardized to unit variance. Default: FALSE (sensible for binary and logical predictors). |
seed |
random number generator seed for cross-validation data splitting. |
... |
additional arguments passed to |
The results provide information from which to choose a probability threshold above which individual out-of-sample probabilies indicate the need to perform a diagnostic test. Out-of-sample performance is estimated using k-fold cross validation.
lasso_screenr
uses the L1 path regularizer of
Park and Hastie (2007), as implemented in the glmpath
package.
Park-Hastie regularization is is similar to the conventional lasso and the
elastic net. It differs from the lasso with the inclusion of a very small,
fixed (1e-5
) penalty on the L2 norm of the parameter
vector, and differs from the elastic net in that the L2 penalty is
fixed. Like the elastic net, the Park-Hastie regularization is robust to
highly correlated predictors. The L2 penalization can be turned off
(L2 = FALSE
), in which case the regularization is similar to the
coventional lasso. Like all L1 regularizers, the Park-Hastie
algorithm automatically "deletes" covariates by shrinking their parameter
estimates to 0.
The coefficients produced by L1 regularization are biased toward
zero. Therefore one might consider refitting the model selected by
regularization using maximum-likelihood estimation as implemented in
logreg_screenr
.
The receiver-operating characteristics are computed using the pROC
package.
By default, the partial area under the ROC curve is computed from
that portion of the curve for which sensitivity is in the closed interval
[0.8, 1.0]. However, the total AUC can be obtained using the argument
partial_auc = FALSE
. Partial areas can be computed for either
ranges of sensitivity or specificity using the arguments
partial_auc_focus
and partial_auc
. By default, partial areas
are standardized.
Out-of-sample performance is estimated using k-fold cross-validation. For a gentle but Python-centric introduction to k-fold cross-validation, see https://machinelearningmastery.com/k-fold-cross-validation/.
lasso_screenr
returns (invisibly) an object of class lasso_screenr
containing the components:
Call
The function call.
Prevalence
Prevalence of the binary response variable.
glmpathObj
An object of class glmpath
returned by
glmpath::glmpath
. See help(glmpath)
and
methods(class = "glmpath")
.
Xmat
The matrix of predictors.
isResults
A list structure containing the results from the two
model fits which produced the minimum AIC and BIC values, respectively. The
results consist of Coefficients
(the logit-scale parameter estimates,
including the intercept), isPreds
(the in-sample predicted
probabilities) and isROC
(the in-sample receiver-operating
characteristic (ROC) of class roc
).
RNG
Specification of the random-number generator used for k-fold data splitting.
RNGseed
RNG seed.
cvResults
A list structure containing the results of k- fold cross-validation estimation of out-of-sample performance.
The list elements of cvResutls
are:
Nfolds
the number folds k
X_ho
the matrix of held-out predictors for each cross-validation fold
minAICcvPreds
the held-out responses and out-of-sample predicted probabilities from AIC-best model selection
minAICcvROC
the out-of-sample ROC object
of class roc
from AIC-best model selection
minBICcvPreds
the held-out responses and out-of-sample predicted probabilities from BIC-best model selection
minBICcvROC
the corresponding out-of-sample predicted probabilities and ROC object from BIC-best model selection
Park MY, Hastie T. L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society Series B. 2007;69(4):659-677. https://doi.org/10.1111/j.1467-9868.2007.00607.x
Kim J-H. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis. 2009:53(11):3735-3745. http://doi.org/10.1016/j.csda.2009.04.009
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C,
Muller M. pROC
: An open-source package for R
and S+ to
analyze and compare ROC curves. BMC Bioinformatics. 2011;12(77):1-8.
http://doi.org/10.1186/1471-2105-12-77
Teferi W, Gutreuter S, Bekele A et al. Adapting strategies for effective and efficient pediatric HIV case finding: Risk screening tool for testing children presenting at high-risk entry points. BMC Infectious Diseases. 2022; 22:480. http://doi.org/10.1186/s12879-022-07460-w
glmpath
, roc
and
auc
.
## Not run: data(unicorns) uniobj1 <- lasso_screenr(testresult ~ Q1 + Q2 + Q3 + Q4 + Q5 + Q6 + Q7, data = unicorns, Nfolds = 10) methods(class = class(uniobj1)) summary(uniobj1) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.