Tuning parameter selection by k-fold cross validation for concave penalized logistic model

Description

Using k-fold cross-validated area under ROC curve (CV-AUC) to select tuning parameter for high-dimensional logistic model with concave penalty

Usage

1
2
3
cv.cvplogistic(y, x, penalty = "mcp", approach = "mmcd", nfold = 5,
kappa = 1/2.7, nlambda = 100, lambda.min = 0.01,
epsilon = 1e-3, maxit = 1e+3, seed = 1000)

Arguments

y

response vector with elements 0 or 1.

x

the design matrix of penalized variables. By default, an intercept vector will be added when fitting the model.

penalty

a character specifying the penalty. One of "mcp" or "scad" should be specified, with "mcp" being the default.

approach

a character specifying the numerical algorithm. One of "mmcd", "adaptive" or "llacda" can be specified, with "mmcd" being the default. See following details for more information.

nfold

an integer value for k-fold cross validation.

kappa

a value specifying the regulation parameter kappa. The proper range for kappa is [0, 1).

nlambda

an integer value specifying the number of grids along the penalty parameter lambda.

lambda.min

a value specifying how to determine the minimal value of penalty parameter lambda. We define lambda_min=lambda_max*lambda.min. We suggest lambda.min=0.0001 if n>p; 0.01 otherwise.

epsilon

a value specifying the converge criterion of algorithm.

maxit

an integer value specifying the maximum number of iterations for each coordinate.

seed

randomization seed for cross validation.

Details

The computation for logistic model with concave penalties is not easy. The MMCD package implements the majorization minimization by coordinate descent (MMCD) algorithm for computing the solution path for logistic model with SCAD or MCP penalties. The algorithm is very efficient and stable for high-dimensional data with p>>n. For MCP penalty, the package also implements the adaptive rescaling and the local linear approximation by coordinate descent algorithms (LLA-CDA) algorithms. For SCAD, only the MMCD algorithm is implemented.

The regularization parameter controls the concavity of the penalty, with larger value of kappa being more concave. When kappa=0, both the MCP and SCAD penalty become Lasso penalty. Hence if zero is specified for kappa, the algorithm returns Lasso solutions.

To select an appropriate tuning parameter for prediction, we use k-fold cross-validated area under ROC curve (CV-AUC) approach. The CV-AUC approach calculated the predictive AUC for each validation set by using the coefficients estimated from the corresponding training set. As the cross validation proceeds, the average predictive AUC is calculated. Then the CV-AUC approach chooses the lambda corresponding to the maximum average predictive AUC as the tuning parameter.

Value

A list of three elements is returned.

scvauc

the CV-AUC corresponding to the selected lambda.

slambda

the selected lambda.

scoef

the regression coefficients corresponding to the selected lambda, with the first element being the intercept.

Author(s)

Dingfeng Jiang

References

Dingfeng Jiang, Jian Huang. Majorization Minimization by Coordinate Descent for Concave Penalized Generalized Linear Models.

Zou, H., Li, R. (2008). One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. Ann Stat, 364: 1509-1533.

Breheny, P., Huang, J. (2011). Coordinate Descent Algorithms for Nonconvex Penalized Regression, with Application to Biological Feature Selection. Ann Appl Stat, 5(1), 232-253.

Jiang, D., Huang, J., Zhang, Y. (2011). The Cross-validated AUC for MCP-Logistic Regression with High-dimensional Data. Stat Methods Med Res, online first, Nov 28, 2011.

See Also

cvplogistic, hybrid.logistic, cv.hybrid, path.plot

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
set.seed(10000)
n=100
y=rbinom(n,1,0.4)
p=10
x=matrix(rnorm(n*p),n,p)

## MCP penalty by MMCD algorithm
out=cv.cvplogistic(y, x, "mcp", "mmcd")
## MCP by adaptive rescaling algorithm
## out=cv.cvplogistic(y, x, "mcp", "adaptive")
## MCP by LLA-CD algorith,
## out=cv.cvplogistic(y, x, "mcp", "llacd")
## SCAD penalty
## out=cv.cvplogistic(y, x, "scad")

## Lasso penalty
## out=cv.cvplogistic(y, x, kappa =0)