Description Usage Arguments Value Details Author(s) Examples
View source: R/prediction_Lasso.R
This function takes in inputs defined by the user and computes the optimum λ for a Lasso model. The function is very flexible and allows for many different settings such as, data splitting and repeated error curves. This function also fully supports multiple-cores parallelisation. The main fitting process is cv.glmnet() from the package glmnet.
1 2 3 | prediction_Lasso(data = data, x.indices = x.indices, response = response,
err.curves = 0, splits = 0, type.lambda = "lambda.min",
interactive = FALSE, parallel = FALSE)
|
data |
A well-cleaned |
x.indices |
The coordinates of the predictors that you would like to model with. Please provide a vecotr of locations e.g. seq(2,6). |
response |
The location of the response within the |
err.curves |
Due to the fact that the Cross-validation process is random, it is very likely that the result will vary quite a bit (if without a seed). This function offers to fit the model multiple times (thus, creating multiple error curves over a range of λs) and average across these multiple error curves. The optimum λ within the range is the one that has the lowest averaged error. Note you can set this argument to 0 if you do not wish to stabilise the process, in which case the seed (1234567) will be used for the CV process. A positive integer indicates the number of error curves to be fitted. Default is 0. |
splits |
A element specifying the training proportion and the test proportion will be set as 1 - traning.proportion. Note if you set |
type.lambda |
Either "lambda.min" or "lambda.1se". Default is "lambda.min. Note when |
interactive |
If you are running this function, please ALWAYS keep this argument to FALSE, which is the default. |
parallel |
parallelisation supported,default is FALSE. |
a list with elements:
seed |
if |
number of err.curves |
if |
best lambda |
if |
prediction error |
if |
prediction_lower |
The function will only compute this when |
prediction_upper |
The function will only compute this when |
rooted prediction error |
The function will only compute this when |
absolute prediction error |
The function will only compute this when |
This function further develops on the cv.glmnet() function from the glmnet package to allow for more flexibility.
More specifically, it allows users the option to split the dataset into a traning set and a test set, which usually
gives give more realistic assessment of perdictive performance than using the whole dataset.
The function also offers an alternative
to compute the optimum λ by averaging across the error curves instead of using a fixed seed. From experneice, for medium
size datasets, with err.curves
larger than 1000, the optimum λ will usually converge to a stable value that consistently
achieves the lowest averaged across error curves value.
The 95 percent confidence interval for this lowest averaged error curves value,
when using the whole dataset e.g. splits = 0
, is computed by using the quantile() command on the cross-validation scores
associated with this optimum λ. Do not worry, we will also provide you with a plot that contains these error curves from the output, so you can
see how it is that the optimum λ value got selected and with its 95 percent CI around it. When we are not stabilising the process e.g.
err.curve = 0
but still using the whole dataset splits = 0
, we compute the λ with the seed (1234567) and the associated CI is computed by using the standard error
provided by the glmnet package and assuming normality. When we do split the dataset however, we provide out of sample mean squared error(MSE), RMSE and MAE instead of 95 percent CI.
Mokyo Zhou
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | library(glmnet)
data(QuickStartExample)
#please NOTE: you can access "QuickStartExample" by using: data.frame(y,x)
#non-split, no error curves, using lambda.min
result <- prediction_Lasso(data = data.frame(y,x), x.indices = seq(2,21),
response = 1, err.curves = 0,splits = 0)
#0.8 /0.2 split, no error curves, using lambda.1se
result <- prediction_Lasso(data = data.frame(y,x), x.indices = seq(2,21),
response = 1, err.curves = 0, splits = 0.8,type.lambda = "lambda.1se")
#non-split, but with 100 error curves with parallel (2 cores)
#cl <- parallel::makeCluster(2)
#doParallel::registerDoParallel(cl)
result <- prediction_Lasso(data = data.frame(y,x), x.indices = seq(2,21),
response = 1, err.curves = 100, splits = 0,parallel = TRUE)
#0.8 / 0.2 split, with 100 error curves with parallel (2 cores)
#cl <- parallel::makeCluster(2)
#doParallel::registerDoParallel(cl)
result <- prediction_Lasso(data = data.frame(y,x), x.indices = seq(2,21),
response = 1, err.curves = 100, splits = 0.8,parallel = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.