beset_elnet | R Documentation |
beset_elnet
is a wrapper to glmnet
for fitting
generalized linear models via penalized maximum likelihood, providing
automated data preprocessing and selection of both the elastic-net penalty
and regularization parameter through repeated k-fold cross-validation.
beset_elnet(
form,
data,
family = "gaussian",
alpha = c(0.01, 0.5, 0.99),
n_lambda = 100,
nest_cv = FALSE,
n_folds = 10,
n_reps = 10,
seed = 42,
remove_collinear_columns = FALSE,
skinny = FALSE,
standardize = TRUE,
epsilon = 1e-07,
maxit = 1e+05,
lambda_min_ratio = NULL,
force_in = NULL,
contrasts = NULL,
offset = NULL,
weights = NULL,
parallel_type = NULL,
n_cores = NULL,
cl = NULL
)
form |
A model |
data |
Either a |
family |
|
alpha |
|
n_lambda |
Number of lambdas to be used in a search. Defaults to
|
nest_cv |
|
n_folds |
|
n_reps |
|
seed |
|
remove_collinear_columns |
|
skinny |
|
standardize |
Logical flag for x variable standardization, prior to
fitting the model sequence. The coefficients are always returned on the
original scale. Default is |
epsilon |
Convergence threshold for coordinate descent. |
maxit |
Maximum number of passes over the data for all lambda values |
lambda_min_ratio |
(Optional) minimum |
force_in |
(Optional) character vector containing the names of any predictor variables that should be included in every model. (Note that if there is an intercept, it is forced into every model by default.) |
contrasts |
Optional |
offset |
(Optional) vector of length equal to the number of observations that is included in the linear predictor. Useful for the "poisson" family (e.g. log of exposure time), or for refining a model by starting at a current fit. |
weights |
(Optional) |
parallel_type |
(Optional) character string indicating the type of
parallel operation to be used, either |
n_cores |
Integer value indicating the number of workers to run in
parallel during subset search and cross-validation. By default, this will
be set to one fewer than the maximum number of physical cores you have
available, as indicated by |
cl |
(Optional) |
A "beset_elnet" or "nested" object inheriting class "beset_elnet" with the following components:
a list with three data frames:
value of L1-L2 mixing parameter
value of shrinkage parameter
area under curve (binomial models only)
mean absolute error (not given for binomial models)
mean cross entropy, estimated as
-log-likelihood/N
, where N
is the number of
observations
mean squared error
R-squared, calculated as
1 - deviance/null deviance
a data frame containing cross-validation statistics for each
alpha
and lambda
listed in fit
. If run with
nest_cv = TRUE
, this will correspond to the inner
cross-validation used to select alpha
and lambda
. Each
metric consists of the following list:
mean of the metric calculated on the aggregate holdout folds for each repetition and averaged across repetitions
the variability between all holdout folds, given as a standard error
after aggregating over all hold-out folds within each repetition, the variability between repetitions, given as a min-max range
if a data_partition
is provided, or if run
with nest_cv = TRUE
, a data frame containing prediction metrics
for each alpha
and lambda
listed in fit
as applied
to the independent test data or outer cross-validation holdout data
a list of all parameters that were passed to
glmnet
a list of "beset_elnet" objects, one for each train- test partition of the outer cross-validation procedure, each consisting of all of the elements listed above
list giving the row indices for the holdout observations for each fold and/or repetition of cross-validation
number of folds used in cross-validation
number of repetitions used in cross-validation
names of error distribution used in the model
the terms
object used
the data
argument
the offset vector used
(where relevant) the contrasts used
(where relevant) a record of the levels of the factors used in fitting
glmnet
data("prostate", package = "beset")
# Regularized logistic regression, with 10 X 10 unnested cross-validation
elnet1 <- beset_elnet(tumor ~ ., data = prostate, family = "binomial")
summary(elnet1)
plot(elnet1)
# Include independent test set in addition to cross-validation
data <- partition(prostate, y = "tumor")
elnet2 <- beset_elnet(tumor ~ ., data = data, family = "binomial")
summary(elnet2)
# Plot deviance explained
plot(elnet2, "rsq")
# Use nested cross-validation
elnet3 <- beset_elnet(tumor ~ ., data = prostate, family = "binomial",
nest_cv = TRUE)
# Turn off 1SE rule and use minima of CV tuning curve to select penalty
summary(elnet3, oneSE = FALSE)
# Plot AUC stat
plot(elnet3, "auc")
# Force a variable into the model (do not penalize coefficient)
elnet4 <- beset_elnet(tumor ~ ., data = data, family = "binomial",
force_in = "race")
summary(elnet4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.