The ezmodelR
package provides a collection of functions that make it easier to extract information from various models for the purpose of comparing them. The package is meant to be used in conjunction with the caret
package and offers new scoring methods, the ability to view the results of regularization, and the ability to quickly and easily compare training and test error of a model while varying a parameter.
The regularization_plot
function allows the user to view a plot comparing coefficient magnitude and regularization strength for:
model='ridge'
)model='lasso'
)model='logistic'
)The function takes arguments specifying the type of model, the regularization strength lambda
(which can be a vector of different values), a dataframe x
for the features, and a dataframe y
for the response. An optional tolerance argument tol
can be specified.
There are two main types of plots that can be created. If lambda
is a vector of length 1, then the plot will display the magnitude of the coefficients of the model.
regularization_plot('lasso', lambda=2, x=X, y=Y)
If lambda
has a length larger than 1, then the plot will display the number of nonzero coefficients of the models produces vs. the regularization strength.
regularization_plot('ridge', 'lambda=2^c(-1,0,1), x=X, y=y)
The optional tolerance argument can be useful if you want to treat all coefficients with a magnitude smaller than some number as 0. By default, tol=1e-7
, so any coefficient with a magnitude less than that will be treated as 0 for plotting purposed.
The function will return a ggplot2
plot of the object, which is convenient as the data plotted (e.g. coefficient values) can then be extracted from it.
The train_test_plot
function allows the user to view a plot comparing training and test error while iterating across a specified hyperparameter for the following models:
model='decision_tree'
)model='ridge'
)model='lasso'
)model='logistic'
) Note: 'ridge', 'lasso', and 'logistic' regression are not yet implemented.
The function takes arguments specifying the type of model, the type of score plotted score_type
, dataframe containing features x
, a dataframe containing the response variable y
, The hyperparameter
to iterate over, the parameter range that defines the iteration param_range
, as well as a random seed that standardizes the split of training and test data random_seed
.
The function will return a plot showing training and test error across the defined range of the hyperparameter. Depending on the output the user can get an idea which value of the hyperparameter lead to under or overfitting and adjust the model accoringly.
Here's an example of the syntax to call the function for a decision tree model:
train_test_plot(model = "decision_tree", score_type = "accuracy", x = Y, y = Y, hyperparameter = "cp", param_range = range(...), random_seed= ...)
The function will return a ggplot2
plot of the object, which is convenient as the data plotted (e.g. coefficient values) can then be extracted from it.
The score()
function allows the user to compute various scoring metrics for a specified model trained in caret
. Currently, the following metrics are supported:
score_type=mse
score_type=accuracy
score_type=specificity
score_type=sensitivity
score_type=r2
score_type=adj_r2
This function takes in arguments specifying the type of caret model to train, the score type to return a function for, and global settings for the caret model to be trained.
It then returns a function that takes two arguments: features: x
and response y
. Calling this function will return the score, as a double, for the previously provided model, and the data provided to this function. x
and y
must both be dataframes
Here is an example of how to call score()
in order to get a function for sensitivity for a random forest model:
rf_score <- score('rf', 'sensitivity')
The above code block will return a function that will compute the sensitivity for a random forest trained on the data you provide it.
If you don't want a function back, or only have one set of data you are testing score on; you could simply provide the x and y data in the original function call as follows:
rf_score <- score('rf', 'sensitivity')(x, y)
This second code block will return a double.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.