multialpha.repeated.cv.glmnet: multialpha.repeated.cv.glmnet

View source: R/dCVnet_innerloop.R

multialpha.repeated.cv.glmnetR Documentation

multialpha.repeated.cv.glmnet

Description

Runs repeated.cv.glmnet for a list of alpha values and returns averaged results, selects the 'best' alpha. One key difference between (repeated.)cv.glmnet and this function is that a single 'best' lambda/alpha combination is identified based on opt.lambda.type. This is intended to be a dCVnet internal function

Usage

multialpha.repeated.cv.glmnet(
  x,
  y,
  alphalist = round(seq(0.2, 1, len = 6)^exp(1), 2),
  lambdas = NULL,
  k = 10,
  nrep = 5,
  opt.lambda.type = c("min", "1se"),
  opt.ystratify = TRUE,
  opt.uniquefolds = FALSE,
  opt.random_seed = NULL,
  family,
  opt.keep_models = c("best", "none", "all"),
  ...
)

Arguments

x

input matrix, of dimension nobs x nvars; each row is an observation vector. Can be in sparse matrix format (inherit from class "sparseMatrix" as in package Matrix). Requirement: nvars >1; in other words, x should have 2 or more columns.

y

response variable. Quantitative for family="gaussian", or family="poisson" (non-negative counts). For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class; for a factor, the last level in alphabetical order is the target class). For family="multinomial", can be a nc>=2 level factor, or a matrix with nc columns of counts or proportions. For either "binomial" or "multinomial", if y is presented as a vector, it will be coerced into a factor. For family="cox", preferably a Surv object from the survival package: see Details section for more information. For family="mgaussian", y is a matrix of quantitative responses.

alphalist

a vector of alpha values to search.

lambdas

a list of lambda sequence lists (corresponding to alphas given in alphalist)

k

the number of folds for k-fold cross-validation.

nrep

the number of repetitions

opt.lambda.type

Method for selecting optimum lambda. One of

  • "min" - returns the lambda with best CV score.

  • "1se" - returns the +1 se lambda

opt.ystratify

Boolean. Outer and inner sampling is stratified by outcome. This is implemented with createFolds

opt.uniquefolds

Boolean. In most circumstances folds will be unique. This requests that random folds are checked for uniqueness in inner and outer loops. Currently it warns if non-unique values are found.

opt.random_seed

Interpreted as integer. This is used to control the generation of random folds.

family

Either a character string representing one of the built-in families, or else a glm() family object. For more information, see Details section below or the documentation for response type (above).

opt.keep_models

The models take up memory. What should we return?

  • best - model with the alpha value selected as optimal.

  • none - no models, just cv results.

  • all - list of models at all alphas.

...

arguments passed to cv.glmnet

Value

an object of class multialpha.repeated.cv.glmnet. Containing:

  • results - merged repeated.cv.glmnet with additional columns indicating alpha and logical for best overall

  • best - best selected row from results

  • folds - record of folds used

  • models - models requested by opt.keep_models.

  • bestmodel - index of the best model such that models[[bestmodel]] returns the model selected as optimal.

See Also

repeated.cv.glmnet


AndrewLawrence/dCVnet documentation built on Sept. 24, 2024, 5:24 a.m.