knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(OptimClassifier) library(ggplot2)
OptimClassifier provides a set of tools for creating models, selecting the best parameters combination for a model, and select the best threshold for your binary classification. The package contains tools for:
The main function could summarize in this table:
Optim. | |||||||
---|---|---|---|---|---|---|---|
Method | |||||||
Threshold optimization | ✅ | ✅ | ✅ | ✖️ | ✖️ | ✅ | ✅ |
Parameter Optimization | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
What parameter or option? | Transformations | Family & Links | Random variable | Linear or Quadratic | CP | Hidden layers | Kernels |
*These models are natively classifiers.
install.packages("OptimClassifier")
For this, you can choose different packages such as:
library(devtools) install_github("economistgame/OptimClassifier")
library(remotes) install_github("economistgame/OptimClassifier")
The example shows you how to solve a common credit scoring problem with this package and GLM methodology.
Firstly, we must load the dataset. In this example, we use Australian Credit.
## Load a Dataset data(AustralianCredit)
Then we create a model with the Optim.GLM function (or the one you want).
## Create the model creditscoring <- Optim.GLM(Y~., AustralianCredit, p = 0.7, seed=2018)
Now you can print the results of the models
### See a ranking of the models tested print(creditscoring)
Are you see with a graphic? Try to typping plot(creditscoring)
### Are you bored of R outputs?? Try to plot plot(creditscoring)
But what is the information (coefficients and others things) of the best model? And the secondth in the rank list?. Simply we can see:
### Access to summary of the best model summary(creditscoring) ### Access to summary of the secondth model summary(creditscoring,2)
Optimization is the process of modifying your parameters on your train model to improve the quality of your classification model. Based on your goals, optimization can involve ad implementation improvements or changes to your classification model. This package is focused in two questions, the threshold and several options.
Optimizing your classification model is important when you want to completely achieve their potential. Through optimization, you can help improve the root mean square error (RMSE), grow the success rate, or accomplish others of your other goals (minimizing type I error or minimizing type II error).
Optim.LM makes transformations of the response variable to improve the precision of the linear model. Then the function searches the best threshold to obtain the best result as possible to your goal.
Transformation included:
Optim.GLM tries to change around different types of error distributions (it called family in R) and several transformations of data (it called link in R). Then the function searches the best threshold to obtain the best result as possible to your goal.
Models trained with this functions:
Optim.LMM searches which one of the variables can use as a random variable improving the model precision. Then the function searches the best threshold to obtain the best result as possible to your goal.
Optim.DA tries to train a Quadratic and Linear Discriminant Analysis because sometimes it does not possible trains a QDA for data characteristics.
Optim.CART focuses on the pruning progress and compares several levels of pruning, for this progress uses a complexity parameter that It is the amount by which splitting that node improved the relative error.
Optim.NN searches which the number of hidden layers improves the model precision. Then the function searches the best threshold to obtain the best result as possible to your goal.
Optim.SVM tries to change around different types of kernels to improve the precision.Then the function searches the best threshold to obtain the best result as possible to your goal.
Kernels trained with this functions:
If you find problems with the package, or there's anything that it doesn't do which you think it should, please submit them to https://github.com/economistgame/OptimClassifier/issues. In particular, let me know about optimizers and formats which you'd like supported, or if you have a workflow which might make sense for inclusion as a default convenience function.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.