knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(basemodels)
This is an introduction to the package basemodels
. This package provides equivalent functions for the dummy classifier and regressor used in 'Python' 'scikit-learn' library with some modifications. We aim to help R users easily identify baseline performance for their classification and regression problems. Our baseline models do not use any predictors to make predictions. They are useful in cases of class imbalance, multi-class classification, and when users want to quickly compare their statistical and machine learning models with several baseline models to see how much they have improved.
We show a few examples here. First, we split the data into training and testing sets.
set.seed(2023) index <- sample(1:nrow(iris), nrow(iris) * 0.8) train_data <- iris[index,] test_data <- iris[-index,]
We can use the dummyClassifier method for the train() function in caret
package.
ctrl1 <- caret::trainControl(method = "none") # Train a dummy classifier with caret dummy_model <- caret::train(Species ~ ., data = train_data, method = dummyClassifier, strategy = "stratified", trControl = ctrl1) # Make predictions using the trained dummy classifier pred_vec <- predict(dummy_model, test_data) # Evaluate the performance of the dummy classifier conf_matrix <- caret::confusionMatrix(pred_vec, test_data$Species) print(conf_matrix)
For a classification problem, we can use the dummy_classifier() function.
dummy_model <- dummy_classifier(train_data$Species, strategy = "proportional", random_state = 2024) # Make predictions using the trained dummy classifier pred_vec <- predict_dummy_classifier(dummy_model, test_data) # Evaluate the performance of the dummy classifier conf_matrix <- caret::confusionMatrix(pred_vec, test_data$Species) print(conf_matrix)
For a regression problem, we can use the dummy_regressor() function.
# Make predictions using the trained dummy regressor reg_model <- dummy_regressor(train_data$Sepal.Length, strategy = "median") y_hat <- predict_dummy_regressor(reg_model, test_data) # Find mean squared error mean((test_data$Sepal.Length-y_hat)^2)
The dummyRegressor method can be used for the train() function in caret
package.
ctrl1 <- caret::trainControl(method = "none") # Train a dummy regressor with caret reg_model <- caret::train(Sepal.Length ~ ., data = train_data, method = dummyRegressor, strategy = "median", trControl = ctrl1) y_hat <- predict(reg_model, test_data) # Find mean squared error mean((test_data$Sepal.Length-y_hat)^2)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.