Adlasso: Find the optimum pair (gamma, lambda) for an Adpative Lasso...
In MokyoZhou/lassoenet: An Interactive Implementation of Penalised Regressions

Description Usage Arguments Value Details Author(s) Examples

This function takes in inputs defined by the user and computes the optimum γ and λ for an Adaptive Lasso model. The function is very flexible and allows for many different settings such as, repeated error curves, different weighting methods and a definable γ grid. This function also fully supports multiple-cores parallelisation. The main fitting process is cv.glmnet() from the package glmnet.

Adlasso(data = data, x.indices = x.indices, response = response,
  err.curves = 0, weight.method = "OLS", gamma.seq = c(0.5, 1, 2),
  type.lambda = "lambda.min", B.rep = 500, significance = 0.05,
  interactive = FALSE, parallel = FALSE)

`data`	A well-cleaned `data.frame` which will be used for modelling. The `data.frame` is also required to have more rows than columns.
`x.indices`	The coordinates of the predictors that you would like to model with. Please provide a vecotr of locations e.g. seq(2,6).
`response`	The location of the response within the `data.frame`.
`err.curves`	Due to the fact that the cross-validation process is random, the result can vary qutie a bit (if without a seed). In order to stabilise the CV process, the function fits a collection of Adaptive Lasso models, each with a different γ value, multiple times (Note, only different in γ, coefficents used for building the weights are the same). Therefore, for EACH γ, we will create multiple error curves over a range of λs and the optimum pair (γ, λ) is the pair that has the lowest averaged error curves value (local optimum). We have an optimum pair (γ, λ) for each γ and the global optimum pair is the pair that has the overall lowest averaged error curved value. Note, with this setting, the process tends to be slow. Thus, we highly suggest multiple-cores parallelisation. You can set this argument to 0 if you do not wish to stabilise the process, in which case the seed (1234567) will be used for the CV process. A postive integer indicates the stabilisation process is desired. For more information about how this works, please see section details below.
`weight.method`	The method that will be used to generate the inital set of cofficients which will then be used for constructing the initial weights for the Adpative Lasso. Under the current version, two methods are supplied: "OLS" or "ridge". The default is "OLS".
`gamma.seq`	A definable γ grid with range from 0 to ∞. Default is `c(0.5,1,2)`. Considering the time consumption, this should be chosen very carefully if `err.curves >0` is also desired.
`type.lambda`	Either "lambda.min" or "lambda.1se". Default is "lambda.min. Note when `err.curves >0`, this argument will not be used.
`B.rep`	The number of residual bootstrappings to do for confidence intervals of the parameters. Default is 500.
`significance`	The significance level of the confidence intervals e.g. 100(1-α)%. Default is 0.05.
`interactive`	If you are running this function, please ALWAYS keep this argument to FALSE, which is the default.
`parallel`	parallelisation supported,default is FALSE.

a list with elements:

`seed`	if `err.curves = 0`, the seed (1234567) will be used to compute γ and λ.
`number of err.curves`	if `err.curves >0`, this shows how many error curves there are (for each γ).
`best gamma`	if `err.curves = 0` (no stabilization), this is a part of the optimum pair (γ, λ) that has the lowest cross validation error (out of a single 2D grid search) with seed (1234567). This is the usual way of finding out (γ, λ). if `err.curves` > 0 (with stabilization), this is a part of the global optimum pair (γ, λ) that has the overall lowest averaged error curves value among all the local optimum pairs.
`best lambda`	A part of the global optimum pair (γ, λ). For more information on how the γ and λ get selected, please see the detail section.
`prediction error`	if `err.curves = 0` and `weight.method = "OLS"`, (no error curves, OLS weighting method), this is the cross validation score associated with the best (γ,λ) pair from the a 2D grid search on the whole data with OLS initial weights and seed (1234567). if `err.curves > 0` and `weight.method = "OLS"`, (error curves, OLS weighting method), this is the overall lowest averaged cross-validation scores associated with the global optimum (γ, λ) pair from using the whole dataset and OLS initial weights. if `err.curves = 0` and `weight.method = "ridge"`, (no error curves, ridge weighting method), this is the cross validation score associated with the best (γ,λ) pair from the a 2D grid search on the whole data with ridge initial weights and seed (1234567). if `err.curves > 0` and `weight.method = "ridge"`, (error curves, ridge weighting method), this is the overall lowest averaged cross-validation scores associated with the global optimum (γ, λ) pair from using the whole dataset and ridge initial weights.
`prediction_lower`	The lower bound for the prediction error. For more information see details below.
`prediction_upper`	The upper bound for the prediction error. For more information see details below.
`ridge lambda`	if `weight.method = "ridge"`, a 3D Cross-Validation grid search will be carried out and this is the optimal lambda for the ridge initial weights. See section details for more information.
`%null deviance explained`	This can be seen as an indicator of goodness of fit.
`CIs`	The 100(1-α)% confidence intervals for the parameters. The confidence intervals are constructed by using residual bootstrapping. The α level can be defined by the user. Please note that a CI of (. , .) means the algorithm failed to estimate a valid CI for the corresponding coefficient. However, the proportion of non-zero estimates out of B.rep bootstraps will also be given, and thus, the user can still gain some insight.

This function further develops on the cv.glmnet() function from the glmnet package to allow for more flexibility. The glmnet package itself does not directly support the fitting of Adapative Lasso models. This function wraps around the main fitting function cv.glmnet() and thus, provides a direct fitting process of the Adpative Lasso model. The function, under this version of the package, offers two methods for the construction of the initial weights, "OLS" and "ridge". If the "OLS" method has been selected, an lm() object will be fitted and coefficients from the fit (except for the intercept) will be used to create the initial weights. If the "ridge" method has been selected, an cv.glmnet(..., alpha = 0) object will be fitted and the correpsonding coefficients (except for the intercept) will be used for building the weights.

The function also offers an alternative to compute the global optimum (γ, λ) pair by averaging across the error curves instead of using a fixed seed. More specifically, for the "OLS" method , after the coefficients have been obtained from a lm() fit and are converted into the initial weights, the stabilisation process takes a double looping structure where the outer layer contains the γ grid and the inner loop builds multiple Adaptive Lasso models for each γ in the outer layer. In this way, for each γ, we create say, B, Adaptive Lasso models and thus, this results in B error curves over a range of λs. Note, the weights are the same throughout this stabilisation process. For each γ then, the function finds the local optimum pair (γ, λ) by averaging across these error curves and find the pair that has the lowest averaged cross validation errors. After the function finds all the local optimum pairs, the golbal optimum pair is the pair that has the overall lowest averaged cross validation error. From experneice, for medium size datasets, with err.curves larger than 1500, the global optimum (γ, λ) will usually converge to stable values that consistently achieves the overall lowest averaged across error curves value. This is a 2D stabilisation process.

When the method "ridge" is selected, the inital ridge coefficents is obtained by a stabilisation process that averages arcoss the error curves (The first stabilisation). Then after the coefficients have been obtained and converted into the inital weights, a 2-dimensional stabilisation process similar to the "OLS" method above will then takes place. Thus, when the "ridge" method is selected, we are stabilising a 3-dimensional process with the first dimension being the ridge coefficents recovery.

When err.curves > 0, the 95 percent confidence interval for the prediction error (overall lowest averaged error curves value) is generated as follows: from the corresponding error curves for the global optimum γ, the cross-validation scores corresponding to the global optimum (γ, λ) are extracted and the command quantile() is then used to compute the 95 percent confidence interval. When we are not stabilising the process e.g. err.curve = 0, we compute the (γ, λ) pair with the seed (1234567) and the associated CI is computed by using the standard error provided by the glmnet package and assuming normality.

Mokyo Zhou

library(glmnet)
data(QuickStartExample)
#please NOTE: You can access "QuickStartExample" by using: data.frame(y,x).


#no error curves, weight.method="OLS", gamma.seq = c(0.5,1,2),type.lambda = "lambda.min"
result <- Adlasso(data = data.frame(y,x), x.indices = seq(2,21), response = 1, err.curves = 0 ,
                 weight.method = "OLS", gamma.seq = c(0.5,1,2), type.lambda="lambda.min")

# 100 error curves, weight.method="OLS", gamma.seq = seq(0,5), 2-cores parallel processing
#cl <- parallel::makeCluster(2)
#doParallel::registerDoParallel(cl)
result <- Adlasso(data = data.frame(y,x), x.indices = seq(2,21), response = 1, err.curves = 100,
                 weight.method = "OLS", gamma.seq = seq(0,5), parallel = TRUE)

# no error curves, weight.method = "ridge", gamma.seq = seq(1,10), type.lambda = "lambda.1se"
result <- Adlasso(data = data.frame(y,x), x.indices = seq(2,21), response = 1, err.curves = 0,
                 weight.method = "ridge", gamma.seq = seq(1,10), type.lambda = "lambda.1se")

#80 error curves, weight.method = "ridge", gamma.seq=c(0,0.5,1,2,2.5),with parallel (2 cores)
#cl <- parallel::makeCluster(2)
#doParallel::registerDoParallel(cl)
result <- Adlasso(data = data.frame(y,x), x.indices = seq(2,21), response = 1, err.curves = 80,
                 weight.method = "ridge", gamma.seq = c(0,0.5,1,2,2.5), parallel = TRUE)