TSLasso: Two-stage hybrid LASSO model.

Description Usage Arguments Details Value References Examples

View source: R/TSLasso.R

Description

This function performs a LASSO logistic regression model using a two-stage hybrid procedure, namely the TSLasso logistic regression model, produces an optimal set of predictors and returns the robust estimations of coefficients of the selected predictors.

Usage

1
TSLasso(x, y, lambda.candidates = list(seq(0.001, 5, by = 0.01)), kfold = 10, seed = 0123)

Arguments

x

predictor matrix.

y

response variable, a factor object with values of 0 and 1.

lambda.candidates

the lambda candidates in the cv.lqa function, with the default values from 0.001 to 5 by=0.01.

kfold

the number of folds of cross validation - default is 10. Although kfold can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is kfold=3.

seed

the seed for random sampling, with the default value 0123.

Details

This function runs the LASSO logistic regression model using a two-stage hybrid procedure. In the two-stage hybrid penalized regression model, the LASSO algorithm is performed to obtain an initial estimator of the coefficients and to reduce the dimension of the model. The coefficient estimates of variables screened by the first stage are used for the weighting parameters of the adaptive LASSO in the second stage to select consistent variables. Accordingly, a portion of irrelevant variables are eliminated during the first stage and a relatively sparse set of variables is obtained. The glmnet algorithm is used for the LASSO estimation in the first stage and the optimal tuning parameter is selected via the K-fold cross-validation. The coefficients of the adaptive LASSO are estimated using the local quadratic approximation algorithm, which is proposed to approximate the nonconvex penalty function in generalized linear models based on penalized likelihood inference. Users can reduce the running time by using 3-fold CV, but the proposed 10-fold CV is assumed by default.

Value

var.selected

significant variables that are selected by the TSLasso model.

var.coef

coefficients of the selected significant variables.

References

[1] Guo, P., Zeng, F., Hu, X., Zhang, D., Zhu, S., Deng, Y., Hao, Y. (2015). Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents. PLoS One, 27;10(7):e0134151.

[2] Zou, H. (2006). The Adaptive Lasso And Its Oracle Properties. Journal of the American Statistical Association, 101(476), 1418:1429.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
library(datasets)
head(iris)
X <- as.matrix(subset(iris, iris$Species!="virginica")[, -5])
Y <- as.numeric(ifelse(subset(iris,iris$Species!="virginica")[, 5]=='versicolor', 0, 1))
# Fit a two-stage hybrid LASSO (TSLasso) logistic regression model.
# The parameters of lambda.candidates in the following example are set as small values to
# reduce the running time, however the default values are proposed.
TSLasso.fit <- TSLasso(x=X, y=Y, lambda.candidates=list(seq(0.1, 1, by=0.05)), 
                       kfold=3, seed=0123)
# Variables selected by the TSLasso model.
TSLasso.fit$var.selected
# Coefficients of the selected variables.
TSLasso.fit$var.coef

SparseLearner documentation built on May 29, 2017, 9:18 p.m.