baseline_binomial: Create baseline evaluations for binary classification

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/baseline_wrappers.R

Description

\Sexpr[results=rd, stage=render]{lifecycle::badge("maturing")}

Create a baseline evaluation of a test set.

In modelling, a baseline is a result that is meaningful to compare the results from our models to. For instance, in classification, we usually want our results to be better than random guessing. E.g. if we have three classes, we can expect an accuracy of 33.33%, as for every observation we have 1/3 chance of guessing the correct class. So our model should achieve a higher accuracy than 33.33% before it is more useful to us than guessing.

While this expected value is often fairly straightforward to find analytically, it only represents what we can expect on average. In reality, it's possible to get far better results than that by guessing. baseline_binomial() finds the range of likely values by evaluating multiple sets of random predictions and summarizing them with a set of useful descriptors. Additionally, it evaluates a set of all 0 predictions and a set of all 1 predictions.

Usage

1
2
3
4
5
6
7
8
9
baseline_binomial(
  test_data,
  dependent_col,
  n = 100,
  metrics = list(),
  positive = 2,
  cutoff = 0.5,
  parallel = FALSE
)

Arguments

test_data

data.frame.

dependent_col

Name of dependent variable in the supplied test and training sets.

n

The number of sets of random predictions to evaluate. (Default is 100)

metrics

list for enabling/disabling metrics.

E.g. list("F1" = FALSE) would remove F1 from the results, and list("Accuracy" = TRUE) would add the regular Accuracy metric to the results. Default values (TRUE/FALSE) will be used for the remaining available metrics.

You can enable/disable all metrics at once by including "all" = TRUE/FALSE in the list. This is done prior to enabling/disabling individual metrics, why f.i. list("all" = FALSE, "Accuracy" = TRUE) would return only the Accuracy metric.

The list can be created with binomial_metrics().

Also accepts the string "all".

positive

Level from dependent variable to predict. Either as character (preferable) or level index (1 or 2 - alphabetically).

E.g. if we have the levels "cat" and "dog" and we want "dog" to be the positive class, we can either provide "dog" or 2, as alphabetically, "dog" comes after "cat".

Note: For reproducibility, it's preferable to specify the name directly, as different locales may sort the levels differently.

Used when calculating confusion matrix metrics and creating ROC curves.

N.B. Only affects evaluation metrics, not the returned predictions.

cutoff

Threshold for predicted classes. (Numeric)

parallel

Whether to run the `n` evaluations in parallel. (Logical)

Remember to register a parallel backend first. E.g. with doParallel::registerDoParallel.

Details

Packages used:

ROC and AUC: pROC::roc

Value

list containing:

  1. a tibble with summarized results (called summarized_metrics)

  2. a tibble with random evaluations (random_evaluations)

....................................................................

Based on the generated test set predictions, a confusion matrix and ROC curve are used to get the following:

ROC:

AUC, Lower CI, and Upper CI

Note, that the ROC curve is only computed when AUC is enabled.

Confusion Matrix:

Balanced Accuracy, Accuracy, F1, Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value, Kappa, Detection Rate, Detection Prevalence, Prevalence, and MCC (Matthews correlation coefficient).

....................................................................

The Summarized Results tibble contains:

The Measure column indicates the statistical descriptor used on the evaluations. The row where Measure == All_0 is the evaluation when all predictions are 0. The row where Measure == All_1 is the evaluation when all predictions are 1.

The aggregated metrics.

....................................................................

The Random Evaluations tibble contains:

The non-aggregated metrics.

A nested tibble with the predictions and targets.

A list of ROC curve objects (if computed).

A nested tibble with the confusion matrix. The Pos_ columns tells you whether a row is a True Positive (TP), True Negative (TN), False Positive (FP), or False Negative (FN), depending on which level is the "positive" class. I.e. the level you wish to predict.

A nested Process information object with information about the evaluation.

Name of dependent variable.

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

See Also

Other baseline functions: baseline_gaussian(), baseline_multinomial(), baseline()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Attach packages
library(cvms)
library(groupdata2) # partition()
library(dplyr) # %>% arrange()

# Data is part of cvms
data <- participant.scores

# Set seed for reproducibility
set.seed(1)

# Partition data
partitions <- partition(data, p = 0.7, list_out = TRUE)
train_set <- partitions[[1]]
test_set <- partitions[[2]]

# Create baseline evaluations
# Note: usually n=100 is a good setting

baseline_binomial(
  test_data = test_set,
  dependent_col = "diagnosis",
  n = 2
)

# Parallelize evaluations

# Attach doParallel and register four cores
# Uncomment:
# library(doParallel)
# registerDoParallel(4)

# Make sure to uncomment the parallel argument
baseline_binomial(
  test_data = test_set,
  dependent_col = "diagnosis",
  n = 4
  #, parallel = TRUE  # Uncomment
)

cvms documentation built on June 17, 2021, 5:12 p.m.