In Dmitriy-Fedorov/abcrlda: Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis

Introduction

This package offers methods to perform asymptotically bias-corrected regularized linear discriminant analysis (ABC_RLDA) for cost-sensitive binary classification. The bias-correction is an estimate of the bias term added to regularized discriminant analysis (RLDA) that minimizes the overall risk. The default magnitude of misclassification costs are equal and set to 0.5; however, the package also offers the options to set them to some predetermined values or, alternatively, take them as hyperparameters to tune.

For some application cost of making mistake
For example, in case of some diseases it is more important to
decrease number of false negatives (illness is not detected) even at expence

Quick Start

The purpose of this section is to give examples of basic usage in order to give users a general sense of the package. For illustration purposes iris dataset will be used. First, we load the abcrlda package and iris dataset.

library(abcrlda)
data(iris)

Abcrlda is designed for binary classification therefore we will limit training data to two species (virginica and versicolor), 50 observation for each.

train_data <- iris[which(iris[, ncol(iris)] == "versicolor" |
                         iris[, ncol(iris)] == "virginica"), 1:4]
train_label <- factor(iris[which(iris[, ncol(iris)] == "versicolor" |
                                 iris[, ncol(iris)] == "virginica"), 5])

After that we are ready to fit model and make predictions using resubstitution.

model <- abcrlda(train_data, train_label)
predict(model, train_data)

Accuracy can be evaluated by this function:

risk_calculate(model, train_data, train_label)

Controlling priorities

Cost is one of the most important hyperparameters for the abcrlda. It controls the overall misclassification risk for classes during model fitting. Risk rate for more important class tends to be smaller.

R = e_0 * C_10 + e_1 * C_01

model <- abcrlda(train_data, train_label, cost = c(0.75, 0.25))


test_versicolor <- iris[which(iris[, ncol(iris)] == "versicolor"), 1:4]
test_virginica <- iris[which(iris[, ncol(iris)] == "virginica"), 1:4]

r <- predict(model, test_versicolor)
sum(r != "versicolor") / nrow(test_virginica)  # resubstitution error for first class

r <- predict(model, test_virginica)
sum(r != "virginica") / nrow(test_virginica)  # resubstitution error for second class

The effect of costs switching can be seen on error rates:

model <- abcrlda(train_data, train_label, cost = c(0.25, 0.75))


test_versicolor <- iris[which(iris[, ncol(iris)] == "versicolor"), 1:4]
test_virginica <- iris[which(iris[, ncol(iris)] == "virginica"), 1:4]

r <- predict(model, test_versicolor)
sum(r != "versicolor") / nrow(test_virginica)  # resubstitution error for first class

r <- predict(model, test_virginica)
sum(r != "virginica") / nrow(test_virginica)  # resubstitution error for second class

Affects of cost values on classification

Cost values can be provided in different ways. There is short form for normalized cost values:

model <- abcrlda(train_data, train_label, gamma = 0.5, cost = 0.75)
a <- predict(model, train_data)

Which is equivalent to:

# same params but more explicit
model <- abcrlda(train_data, train_label, gamma = 0.5, cost = c(0.75, 0.25))
b <- predict(model, train_data)

Scaling of a cost do not change prediction that will be made by model:

# same class costs ratio
model <- abcrlda(train_data, train_label, gamma = 0.5, cost = c(3, 1))
c <- predict(model, train_data)

# all this model will give the same predictions
all(a == b & a == c & b == c)

Grid Search for optimal Hyperparameters

This package contain function to performs grid search to estimate the optimal hyperparameters (gamma and cost) within specified space based on double asymptotic risk estimation or cross validation. Double asymptotic risk estimation is more efficient to compute because it uses closed form for risk estimation. For further details, refer to the article in the reference section.

cost_range <- seq(0.1, 0.9, by = 0.2)
gamma_range <- c(0.1, 1, 10, 100, 1000)

gs <- grid_search(train_data, train_label,
                  range_gamma = gamma_range,
                  range_cost = cost_range,
                  method = "estimator")
model <- abcrlda(train_data, train_label,
                 gamma = gs$gamma, cost = gs$cost)

cost_range <- matrix(1:10, ncol = 2)
gamma_range <- c(0.1, 1, 10, 100, 1000)

gs <- grid_search(train_data, train_label,
                  range_gamma = gamma_range,
                  range_cost = cost_range,
                  method = "cross")
model <- abcrlda(train_data, train_label,
                 gamma = gs$gamma, cost = gs$cost)

Dmitriy-Fedorov/abcrlda documentation built on June 2, 2020, 12:59 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Dmitriy-Fedorov/abcrlda
Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis

In Dmitriy-Fedorov/abcrlda: Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis

Introduction

Quick Start

Controlling priorities

Affects of cost values on classification

Grid Search for optimal Hyperparameters

R Package Documentation

Browse R Packages

We want your feedback!

Dmitriy-Fedorov/abcrlda Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis

In Dmitriy-Fedorov/abcrlda: Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis

Introduction

Quick Start

Controlling priorities

Affects of cost values on classification

Grid Search for optimal Hyperparameters

R Package Documentation

Browse R Packages

We want your feedback!

Dmitriy-Fedorov/abcrlda
Asymptotically Bias-Corrected Regularized Linear Discriminant Analysis