interactiver: An interactive function for building penalised models

Description Usage Arguments Value Details Author(s) Examples

View source: R/interactiver.R

Description

This is an interactive function that aims to guide users to obtain better penalised models to their own datasets through a series of questions and suggestions. I try to equip each question with some useful and practical hints that would hopefully steer the user into a more appropriate modelling direction and spark research interest. The function has two main emphases: prediction accuracy or making inference. For the prediction accuracy, the function mainly support the Lasso and the Elastic Net. On the other hand, the function would mainly uses the Adpative Lasso model (support both "OLS" and "ridge" weightings) for inference purposes. This function is extremely fixable and has many unique features that beyond the based fitting function cv.glmnet from the glmnet package. For more information please see the "Details" section.

Usage

1
interactiver(data = data, parallel = FALSE, ncores = 2)

Arguments

data

A well-cleaned data.frame which will be used for modelling. The data.frame is also required to have more rows than columns.

parallel

Multi-cores parallelisation is fully supported. The default is FALSE

ncores

The number of cores that you would like to use in the parallel processing. This is only needed if parallel = TRUE. Alse note that the function will automatically switch off the extra connections at the end of the computation.

Value

: a list with elements:

model.formula

(FOR PREDICTION FOCUS ONLY). This is the formula for the best model. Note, the value of lambda and alpha(if Elastic Net is the best model) can be found in the summary table (see below) and thus, the user can reproduce the model if needed. For inference focus, I omit the Adaptive Lasso model formula here. Users who are interested in how to reproduce the Adaptive Lasso results, I suggest this link http://ricardoscr.github.io/how-to-adaptive-lasso.html and use the results from the sumary table (see below).

coefficients

(FORE PREDICTION FOCUS ONLY). This contains the point estimates from the best model. Note, coefficents from the Lasso or the Elastic Net are biased! users are advised to interpret them with caution. Partially due to this reason, confidence intervals for the estimates are not provided within the glmnet package and thus, I will laso omit them here. For users who are more interested in inference, I recommond the inference path within this function which uses the Adaptive Lasso.

summaries

A summary table contains all the important results of the best model. Users can use the results from this table to reproduce models if desire. For more information about the results in the table, please visit prediction_Lasso, prediction_ElasticNet and Adlasso. Furthermore, for the PREDICTION results, alongside the numeric results, the function will also offers some visualisation that presents the results in a graphical way.
1. (non-split, non-repeat): MSE vs ln(lambda) & the solution regularisation paths of the best model
2. (non-split, repeat): error curves for both the Lasso and the Elastic Net & the solution regularisation paths of the best model
3. (split, non-repeat): MSE vs ln(lambda) on the training set & the solution regularisation paths of the best model (on train)
4. (split, repeat): error curves for both the Lasso and the Elastic Net (on traint) & the solution regularisation paths of the best model (on train)

Confidence.intervals

(FOR INFERENCE ONLY). A table contains the Adaptive Lasso point estimates, the corresponding 100(1 - α)% confidence intervals and the proportion of nonzero estimates within each of the boostrapped parameter vectors.

Details

Here we will briefly dicsuss the unique features that are inherent within this interactive function. For users who are more interesed in prediction accuracy, the function offers to fit both the Lasso and the Elastic Net model. The function will ask a series of performance related questions and obtain the answers. These answers will then be converted into a collection of inputs that will be used within the Lasso and the Elastic Net. Example questions include: Whether repeated error curves (please see functions: prediction_Lasso and prediction_ElasticNet for more details) should be used to stabilise the cross-validation process? If not, should the "lambda.min" or "lambda.1se" be used as the optimum λ? Step size of the alpha grid? Whether the dataset should be into a training and a testing set? We also aimed to put some explanation or practical suggestions into each of the questions. Furthermore, in this interactive mode, the function also offers automatic perdiction performance comparison between the Lasso and the Elastic Net based on one of the two metrics selected by the user: MSE or MAE. The function will then return a summary table of the best model as well as some graphical visualisation of the best model.

For users who put more emphases on making inferences, the function will fit the Apdative Lasso model with inputs converted from the answers provided by the user from questions such as: whether repeated error curves are desired, what weighting method to use "OLS" or "ridge" etc. As opposed to the prediction methods, in order to achieve better inference ,the function will always use the full dataset. The function will also provide the 95% confidence intervals for the parameters from the best Adaptive Lasso model. The confidence intervals are obtained by using residual boostrapping and the α level of the intervals is fully adjustable by the user. Please run and have fun with this interactive function, and for more technical information please visit the help pages for functions prediction_Lasso, prediction_ElasticNet and Adlasso.

Author(s)

Mokyo Zhou

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Not run: 
library(glmnet)
data(QuickStartExample)
#Please NOTE: you can access "QuickStartExample" by using data.frame(y,x).

row.samples <- sample(1:100,250,TRUE)
data <- data.frame(y,x)[row.samples,]
result <- interactiver(data = data)

#2-cores parallel
result <- interactiver(data = data, parallel = TRUE, ncores = 2)
#NOTE the function will automatically switch off the extra connections after the computation.

## End(Not run)

MokyoZhou/lassoenet documentation built on May 20, 2019, 11:38 a.m.