Overview

The ezmodelR package provides a collection of functions that make it easier to extract information from various models for the purpose of comparing them. The package is meant to be used in conjunction with the caret package and offers new scoring methods, the ability to view the results of regularization, and the ability to quickly and easily compare training and test error of a model while varying a parameter.

Plotting Regularization Results

The regularization_plot function allows the user to view a plot comparing coefficient magnitude and regularization strength for:

The function takes arguments specifying the type of model, the regularization strength lambda (which can be a vector of different values), a dataframe x for the features, and a dataframe y for the response. An optional tolerance argument tol can be specified.

There are two main types of plots that can be created. If lambda is a vector of length 1, then the plot will display the magnitude of the coefficients of the model.

regularization_plot('lasso', lambda=2, x=X, y=Y)

If lambda has a length larger than 1, then the plot will display the number of nonzero coefficients of the models produces vs. the regularization strength.

regularization_plot('ridge', 'lambda=2^c(-1,0,1), x=X, y=y)

The optional tolerance argument can be useful if you want to treat all coefficients with a magnitude smaller than some number as 0. By default, tol=1e-7, so any coefficient with a magnitude less than that will be treated as 0 for plotting purposed.

The function will return a ggplot2 plot of the object, which is convenient as the data plotted (e.g. coefficient values) can then be extracted from it.

Plotting Training and Test Errors

The train_test_plot function allows the user to view a plot comparing training and test error while iterating across a specified hyperparameter for the following models:

Note: 'ridge', 'lasso', and 'logistic' regression are not yet implemented.

The function takes arguments specifying the type of model, the type of score plotted score_type, dataframe containing features x, a dataframe containing the response variable y , The hyperparameter to iterate over, the parameter range that defines the iteration param_range, as well as a random seed that standardizes the split of training and test data random_seed.

The function will return a plot showing training and test error across the defined range of the hyperparameter. Depending on the output the user can get an idea which value of the hyperparameter lead to under or overfitting and adjust the model accoringly.

Here's an example of the syntax to call the function for a decision tree model:

train_test_plot(model = "decision_tree", score_type = "accuracy", x = Y,
                                              y = Y, hyperparameter = "cp", param_range = range(...), random_seed= ...)

The function will return a ggplot2 plot of the object, which is convenient as the data plotted (e.g. coefficient values) can then be extracted from it.

Calculating Scores.

The score() function allows the user to compute various scoring metrics for a specified model trained in caret. Currently, the following metrics are supported:

This function takes in arguments specifying the type of caret model to train, the score type to return a function for, and global settings for the caret model to be trained.

It then returns a function that takes two arguments: features: x and response y. Calling this function will return the score, as a double, for the previously provided model, and the data provided to this function. x and y must both be dataframes

Here is an example of how to call score() in order to get a function for sensitivity for a random forest model:

rf_score <- score('rf', 'sensitivity')

The above code block will return a function that will compute the sensitivity for a random forest trained on the data you provide it.

If you don't want a function back, or only have one set of data you are testing score on; you could simply provide the x and y data in the original function call as follows:

rf_score <- score('rf', 'sensitivity')(x, y)

This second code block will return a double.



UBC-MDS/ezmodelR documentation built on May 25, 2019, 1:35 p.m.