cvl: Cross-validated penalized regression In penalized: L1 (Lasso and Fused Lasso) and L2 (Ridge) Penalized Estimation in GLMs and in the Cox Model

Description

Cross-validating generalized linear models with L1 (lasso or fused lasso) and/or L2 (ridge) penalties, using likelihood cross-validation.

Usage

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 cvl (response, penalized, unpenalized, lambda1 = 0, lambda2= 0, positive = FALSE, fusedl = FALSE, data, model = c("cox", "logistic", "linear", "poisson"), startbeta, startgamma, fold, epsilon = 1e-10, maxiter, standardize = FALSE, trace = TRUE, approximate = FALSE) optL1 (response, penalized, unpenalized, minlambda1, maxlambda1, base1, lambda2 = 0, fusedl = FALSE, positive = FALSE, data, model = c("cox", "logistic", "linear", "poisson"), startbeta, startgamma, fold, epsilon = 1e-10, maxiter = Inf, standardize = FALSE, tol = .Machine\$double.eps^0.25, trace = TRUE) optL2 (response, penalized, unpenalized, lambda1 = 0, minlambda2, maxlambda2, base2, fusedl = FALSE ,positive = FALSE, data, model = c("cox", "logistic", "linear", "poisson"), startbeta, startgamma, fold, epsilon = 1e-10, maxiter, standardize = FALSE, tol = .Machine\$double.eps^0.25, trace = TRUE, approximate = FALSE) profL1 (response, penalized, unpenalized, minlambda1, maxlambda1, base1, lambda2 = 0, fusedl = FALSE,positive = FALSE, data, model = c("cox", "logistic", "linear", "poisson"), startbeta, startgamma, fold, epsilon = 1e-10, maxiter = Inf, standardize = FALSE, steps = 100, minsteps = steps/3, log = FALSE, save.predictions = FALSE, trace = TRUE, plot = FALSE) profL2 (response, penalized, unpenalized, lambda1 = 0, minlambda2, maxlambda2, base2, fusedl = FALSE,positive = FALSE, data, model = c("cox", "logistic", "linear", "poisson"), startbeta, startgamma, fold, epsilon = 1e-10, maxiter, standardize = FALSE, steps = 100, minsteps = steps/2, log = TRUE, save.predictions = FALSE, trace = TRUE, plot = FALSE, approximate = FALSE)

Arguments

 response The response variable (vector). This should be a numeric vector for linear regression, a Surv object for Cox regression and factor or a vector of 0/1 values for logistic regression. penalized The penalized covariates. These may be specified either as a matrix or as a (one-sided) formula object. See also under data. unpenalized Additional unpenalized covariates. Specified as under penalized. Note that an unpenalized intercept is included in the model by default (except in the cox model). This can be suppressed by specifying unpenalized = ~0. lambda1, lambda2 The fixed values of the tuning parameters for L1 and L2 penalization. Each must be either a single positive numbers or a vector with length equal to the number of covariates in penalized argument. In the latter case, each covariate is given its own penalty weight. minlambda1, minlambda2, maxlambda1, maxlambda2 The values of the tuning parameters for L1 or L2 penalization between which the cross-validated likelihood is to be profiled or optimized. For fused lasso penalty minlambda2 and maxlambda2 are the tuning parameters for L1 penalty on the differences of the coefficients between which the cross-validated likelihood is to be optimized. base1, base2 An optional vector of length equal to the number of covariates in penalized. If supplied, profiling or optimization is performed between base1*minlambda1 and base1*maxlambda1; analogous for base2. fusedl If TRUE or a vector, the penalization method used is fused lasso. The value for lambda1 is used as the tuning parameter for L1 penalization on the coefficients and the value for lambda2 is used as the tuning parameter for L1 penalization on the differences of the coefficients. Default value is FALSE. positive If TRUE, constrains the estimated regression coefficients of all penalized covariates to be non-negative. If a logical vector with the length of the number of covariates in penalized, constrains the estimated regression coefficients of a subset of the penalized covariates to be non-negative. data A data.frame used to evaluate response, and the terms of penalized or unpenalized when these have been specified as a formula object. model The model to be used. If missing, the model will be guessed from the response input. startbeta Starting values for the regression coefficients of the penalized covariates. These starting values will be used only for the first values of lambda1 and lambda2. startgamma Starting values for the regression coefficients of the unpenalized covariates. These starting values will be used only for the first values of lambda1 and lambda2. fold The fold for cross-validation. May be supplied as a single number (between 2 and n) giving the number of folds, or, alternatively, as a length n vector with values in 1:fold, specifying exactly which subjects are assigned to which fold. The default is fold = 1:n, resulting in leave-one-out (n-fold) cross-validation. epsilon The convergence criterion. As in glm. Convergence is judged separately on the likelihood and on the penalty. maxiter The maximum number of iterations allowed in each fitting of the model. Set by default at 25 when only an L2 penalty is present, infinite otherwise. standardize If TRUE, standardizes all penalized covariates to unit central L2-norm before applying penalization. steps The maximum number of steps between minlambda1 and maxlambda1 or minlambda2 and maxlambda2 at which the cross-validated likelihood is to be calculated. minsteps The minimum number of steps between minlambda1 and maxlambda1 or minlambda2 and maxlambda2 at which the cross-validated likelihood is to be calculated. If minsteps is smaller than steps, the algorithm will automatically stop when the cross-validated likelihood drops below the cross-validated likelihood of the null model, provided it has done at least minsteps steps. log If FALSE, the steps between minlambda1 and maxlambda1 or minlambda2 and maxlambda2 are equidistant on a linear scale, if TRUE on a logarithmic scale. Please note the different default between optL1 (FALSE) and optL2 (TRUE). tol The tolerance of the Brent algorithm used for minimization. See also optimize. save.predictions Controls whether or not to save cross-validated predictions for all values of lambda. trace If TRUE, prints progress information. Note that setting trace=TRUE may slow down the algorithm (but it often feels quicker) approximate If TRUE, the cross-validated likelihood values are approximated rather than fully calculated. Note that this option is only available for ridge models. plot If TRUE, makes a plot of cross-validated likelihood versus lambda.

Details

All five functions return a list with the following named elements:

lambda:

For optL1 and optL2 lambda gives the optimal value of the tuning parameters found. For profL1 and profL2 lambda is the vector of values of the tuning parameter for which the cross-validated likelihood has been calculated. Absent in the output of cvl.

cvl:

The value(s) of the cross-validated likelihood. For optL1, optL2 this is the cross-validated likelihood at the optimal value of the tuning parameter.

fold:

Returns the precise allocation of the subjects into the cross-validation folds. Note that the same allocation is used for all cross-validated likelihood calculations in each call to optL1, optL2, profL1, profL2.

predictions:

The cross-validated predictions for the left-out samples. The precise format of the cross-validated predictions depends on the type of generalized linear model (see breslow for survival models. The functions profL1 and profL2 return a list here (only if save.predictions = TRUE), whereas optL1, optL2 return the predictions for the optimal value of the tuning parameter only.

fullfit:

The fitted model on the full data. The functions profL1 and profL2 return a list of penfit objects here, whereas optL1, optL2 return the full data fit (a single penfit object) for the optimal value of the tuning parameter only.

Value

A named list. See details.

Note

The optL1 and optL2 functions use Brent's algorithm for minimization without derivatives (see also optimize). There is a risk that these functions converge to a local instead of to a global optimum. This is especially the case for optL1, as the cross-validated likelihood as a function of lambda1 quite often has local optima. It is recommended to use optL1 in combination with profL1 to check whether optL1 has converged to the right optimum.

Author(s)

Jelle Goeman: j.j.goeman@lumc.nl

References

Goeman J.J. (2010). L-1 Penalized Estimation in the Cox Proportional Hazards Model. Biometrical Journal 52 (1) 70-84.

Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # More examples in the package vignette: # type vignette("penalized") data(nki70) attach(nki70) # Finding an optimal cross-validated likelihood opt <- optL1(Surv(time, event), penalized = ,8:77], fold = 5) coefficients(opt\$fullfit) plot(opt\$predictions) # Plotting the profile of the cross-validated likelihood prof <- profL1(Surv(time, event), penalized = ,8:77], fold = opt\$fold, steps=10) plot(prof\$lambda, prof\$cvl, type="l") plotpath(prof\$fullfit)

Example output   Welcome to penalized. For extended examples, see vignette("penalized").
lambda= 4.133719 	12345cvl= -254.3604
lambda= 6.688499 	12345cvl= -257.6856
lambda= 2.554779 	12345cvl= -252.6971
lambda= 1.57894 	12345cvl= -258.902
lambda= 3.157881 	12345cvl= -252.3473
lambda= 3.029548 	12345cvl= -252.2689
lambda= 2.971872 	12345cvl= -252.2501
lambda= 2.812557 	12345cvl= -252.3046
lambda= 2.947776 	12345cvl= -252.2516
lambda= 2.966433 	12345cvl= -252.2502
lambda= 2.97557 	12345cvl= -252.2501
lambda= 2.975587 	12345cvl= -252.2501
lambda= 2.996199 	12345cvl= -252.2535
lambda= 2.98346 	12345cvl= -252.2502
lambda= 2.978594 	12345cvl= -252.2501
lambda= 2.976806 	12345cvl= -252.2501
lambda= 2.976188 	12345cvl= -252.2501
QSCN6L1      SCUBE2      ZNF533      IGFBP5        PRC1        ESM1
0.06546981 -0.07450513 -0.54863441  0.58674788  1.31951640  0.05861560
lambda= 10.82222 	12345cvl= -258.4229
lambda= 9.619749 	12345cvl= -258.4968
lambda= 8.417281 	12345cvl= -258.4897
lambda= 7.214812 	12345cvl= -257.9279
lambda= 6.012343 	12345cvl= -257.4077
lambda= 4.809875 	12345cvl= -255.957
lambda= 3.607406 	12345cvl= -252.9509
lambda= 2.404937 	12345cvl= -253.1057
lambda= 1.202469 	12345cvl= -258.234
lambda= 0 	12345cvl= -Inf

penalized documentation built on May 2, 2019, 7:28 a.m.