Introduction to basemodels"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(basemodels)

Introduction

This is an introduction to the package basemodels. This package provides equivalent functions for the dummy classifier and regressor used in 'Python' 'scikit-learn' library with some modifications. We aim to help R users easily identify baseline performance for their classification and regression problems. Our baseline models do not use any predictors to make predictions. They are useful in cases of class imbalance, multi-class classification, and when users want to quickly compare their statistical and machine learning models with several baseline models to see how much they have improved.

Examples

We show a few examples here. First, we split the data into training and testing sets.

set.seed(2023)
index <- sample(1:nrow(iris), nrow(iris) * 0.8)
train_data <- iris[index,]
test_data <- iris[-index,]

We can use the dummyClassifier method for the train() function in caret package.

ctrl1 <- caret::trainControl(method = "none")
# Train a dummy classifier with caret
dummy_model <- caret::train(Species ~ ., 
                            data = train_data,
                            method = dummyClassifier,
                            strategy = "stratified",
                               trControl = ctrl1)

# Make predictions using the trained dummy classifier
pred_vec <- predict(dummy_model, test_data)

# Evaluate the performance of the dummy classifier
conf_matrix <- caret::confusionMatrix(pred_vec, test_data$Species)
print(conf_matrix)

For a classification problem, we can use the dummy_classifier() function.

dummy_model <- dummy_classifier(train_data$Species, strategy = "proportional", random_state = 2024)

# Make predictions using the trained dummy classifier
pred_vec <- predict_dummy_classifier(dummy_model, test_data)

# Evaluate the performance of the dummy classifier
conf_matrix <- caret::confusionMatrix(pred_vec, test_data$Species)
print(conf_matrix)

For a regression problem, we can use the dummy_regressor() function.

# Make predictions using the trained dummy regressor
reg_model <- dummy_regressor(train_data$Sepal.Length, strategy = "median")
y_hat <- predict_dummy_regressor(reg_model, test_data)
# Find mean squared error
mean((test_data$Sepal.Length-y_hat)^2)

The dummyRegressor method can be used for the train() function in caret package.

ctrl1 <- caret::trainControl(method = "none")
# Train a dummy regressor with caret
reg_model <- caret::train(Sepal.Length ~ ., data = train_data,
                               method = dummyRegressor,
                               strategy = "median",
                               trControl = ctrl1)
y_hat <- predict(reg_model, test_data)
# Find mean squared error
mean((test_data$Sepal.Length-y_hat)^2)


Try the basemodels package in your browser

Any scripts or data that you put into this service are public.

basemodels documentation built on Aug. 9, 2023, 9:09 a.m.