evaluate: Evaluate your model's performance

Description Usage Arguments Details Value Author(s) Examples

View source: R/evaluate.R

Description

\Sexpr[results=rd, stage=render]{lifecycle::badge("maturing")}

Evaluate your model's predictions on a set of evaluation metrics.

Create ID-aggregated evaluations by multiple methods.

Currently supports regression and classification (binary and multiclass). See type.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
evaluate(
  data,
  target_col,
  prediction_cols,
  type = "gaussian",
  id_col = NULL,
  id_method = "mean",
  models = NULL,
  apply_softmax = TRUE,
  cutoff = 0.5,
  positive = 2,
  metrics = list(),
  include_predictions = TRUE,
  parallel = FALSE
)

Arguments

data

Data frame with predictions, targets and (optionally) an ID column. Can be grouped with group_by.

Multinomial

When type is "multinomial", the predictions should be passed as one column per class with the probability of that class. The columns should have the name of their class, as they are named in the target column. E.g.:

class_1 class_2 class_3 target
0.269 0.528 0.203 class_2
0.368 0.322 0.310 class_3
0.375 0.371 0.254 class_2
... ... ... ...

Binomial

When type is "binomial", the predictions should be passed as one column with the probability of class being the second class alphabetically (1 if classes are 0 and 1). E.g.:

prediction target
0.769 1
0.368 1
0.375 0
... ...

Gaussian

When type is "gaussian", the predictions should be passed as one column with the predicted values. E.g.:

prediction target
28.9 30.2
33.2 27.1
23.4 21.3
... ...
target_col

Name of the column with the true classes/values in data.

When type is "multinomial", this column should contain the class names, not their indices.

prediction_cols

Name(s) of column(s) with the predictions.

When evaluating a classification task, the column(s) should contain the predicted probabilities.

type

Type of evaluation to perform:

"gaussian" for regression (like linear regression).

"binomial" for binary classification.

"multinomial" for multiclass classification.

id_col

Name of ID column to aggregate predictions by.

N.B. Current methods assume that the target class/value is constant within the IDs.

N.B. When aggregating by ID, some metrics (such as those from model objects) are excluded.

id_method

Method to use when aggregating predictions by ID. Either "mean" or "majority".

When type is gaussian, only the "mean" method is available.

mean

The average prediction (value or probability) is calculated per ID and evaluated. This method assumes that the target class/value is constant within the IDs.

majority

The most predicted class per ID is found and evaluated. In case of a tie, the winning classes share the probability (e.g. P = 0.5 each when two majority classes). This method assumes that the target class/value is constant within the IDs.

models

Unnamed list of fitted model(s) for calculating R^2 metrics and information criterion metrics. May only work for some types of models.

When only passing one model, remember to pass it in a list (e.g. list(m)).

N.B. When data is grouped, provide one model per group in the same order as the groups.

N.B. When aggregating by ID (i.e. when id_col is not NULL), it's not currently possible to pass model objects, as these would not be aggregated by the IDs.

N.B. Currently, Gaussian only.

apply_softmax

Whether to apply the softmax function to the prediction columns when type is "multinomial".

N.B. Multinomial models only.

cutoff

Threshold for predicted classes. (Numeric)

N.B. Binomial models only.

positive

Level from dependent variable to predict. Either as character or level index (1 or 2 - alphabetically).

E.g. if we have the levels "cat" and "dog" and we want "dog" to be the positive class, we can either provide "dog" or 2, as alphabetically, "dog" comes after "cat".

Used when calculating confusion matrix metrics and creating ROC curves.

N.B. Only affects the evaluation metrics.

N.B. Binomial models only.

metrics

List for enabling/disabling metrics.

E.g. list("RMSE" = FALSE) would remove RMSE from the results, and list("Accuracy" = TRUE) would add the regular accuracy metric to the classification results. Default values (TRUE/FALSE) will be used for the remaining metrics available.

Also accepts the string "all".

N.B. Currently, disabled metrics are still computed.

include_predictions

Whether to include the predictions in the output as a nested tibble. (Logical)

parallel

Whether to run evaluations in parallel, when data is grouped with group_by.

Details

Packages used:

Gaussian:

r2m : MuMIn::r.squaredGLMM

r2c : MuMIn::r.squaredGLMM

AIC : stats::AIC

AICc : MuMIn::AICc

BIC : stats::BIC

Binomial and Multinomial:

Confusion matrix and related metrics: caret::confusionMatrix

ROC and related metrics: pROC::roc

MCC: mltools::mcc

Value

—————————————————————-

Gaussian Results

—————————————————————-

Tibble containing the following metrics by default:

Average RMSE, MAE, r2m, r2c, AIC, AICc, and BIC.

N.B. Some of the metrics will only be returned if model objects were passed, and will be NA if they could not be extracted from the passed model objects.

Also includes:

A nested tibble with the Predictions and targets.

A nested tibble with the model Coefficients. The coefficients are extracted from the model object with broom::tidy() or coef() (with some restrictions on the output). If these attempts fail, a default coefficients tibble filled with NAs is returned.

—————————————————————-

Binomial Results

—————————————————————-

Tibble with the following evaluation metrics, based on a confusion matrix and a ROC curve fitted to the predictions:

ROC:

AUC, Lower CI, and Upper CI

Confusion Matrix:

Balanced Accuracy, F1, Sensitivity, Specificity, Positive Prediction Value, Negative Prediction Value, Kappa, Detection Rate, Detection Prevalence, Prevalence, and MCC (Matthews correlation coefficient).

Other available metrics (disabled by default, see metrics): Accuracy.

Also includes:

A nested tibble with the predictions and targets.

A nested tibble with the sensativities and specificities from the ROC curve.

A nested tibble with the confusion matrix. The Pos_ columns tells you whether a row is a True Positive (TP), True Negative (TN), False Positive (FP), or False Negative (FN), depending on which level is the "positive" class. I.e. the level you wish to predict.

—————————————————————-

Multinomial Results

—————————————————————-

For each class, a one-vs-all binomial evaluation is performed. This creates a Class Level Results tibble containing the same metrics as the binomial results described above, along with the Support metric, which is simply a count of the class in the target column. These metrics are used to calculate the macro metrics in the output tibble. The nested class level results tibble is also included in the output tibble, and would usually be reported along with the macro and overall metrics.

The output tibble contains the macro and overall metrics. The metrics that share their name with the metrics in the nested class level results tibble are averages of those metrics (note: does not remove NAs before averaging). In addition to these, it also includes the Overall Accuracy metric.

Other available metrics (disabled by default, see metrics): Accuracy, Weighted Balanced Accuracy, Weighted Accuracy, Weighted F1, Weighted Sensitivity, Weighted Sensitivity, Weighted Specificity, Weighted Pos Pred Value, Weighted Neg Pred Value, Weighted AUC, Weighted Lower CI, Weighted Upper CI, Weighted Kappa, Weighted MCC, Weighted Detection Rate, Weighted Detection Prevalence, and Weighted Prevalence.

Note that the "Weighted" metrics are weighted averages, weighted by the Support.

Also includes:

A nested tibble with the Predictions and targets.

A nested tibble with the multiclass Confusion Matrix.

Class Level Results

Besides the binomial evaluation metrics and the Support metric, the nested class level results tibble also contains:

A nested tibble with the sensativities and specificities from the ROC curve.

A nested tibble with the Confusion Matrix from the one-vs-all evaluation. The Pos_ columns tells you whether a row is a True Positive (TP), True Negative (TN), False Positive (FP), or False Negative (FN), depending on which level is the "positive" class. In our case, 1 is the current class and 0 represents all the other classes together.

Author(s)

Ludvig Renbo Olsen, [email protected]

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# Attach packages
library(cvms)
library(dplyr)

# Load data
data <- participant.scores

# Fit models
gaussian_model <- lm(age ~ diagnosis, data = data)
binomial_model <- glm(diagnosis ~ score, data = data)

# Add predictions
data[["gaussian_predictions"]] <- predict(gaussian_model, data,
                                          type = "response",
                                          allow.new.levels = TRUE)
data[["binomial_predictions"]] <- predict(binomial_model, data,
                                          allow.new.levels = TRUE)

# Gaussian evaluation
evaluate(data = data, target_col = "age",
         prediction_cols = "gaussian_predictions",
         models = list(gaussian_model),
         type = "gaussian")

# Binomial evaluation
evaluate(data = data, target_col = "diagnosis",
         prediction_cols = "binomial_predictions",
         type = "binomial")

# Multinomial

# Create a tibble with predicted probabilities
data_mc <- multiclass_probability_tibble(
    num_classes = 3, num_observations = 30,
    apply_softmax = TRUE, FUN = runif,
    class_name = "class_")

# Add targets
class_names <- paste0("class_", c(1,2,3))
data_mc[["target"]] <- sample(x = class_names,
                              size = 30, replace = TRUE)

# Multinomial evaluation
evaluate(data = data_mc, target_col = "target",
         prediction_cols = class_names,
         type = "multinomial")

# ID evaluation

# Gaussian ID evaluation
# Note that 'age' is the same for all observations
# of a participant
evaluate(data = data, target_col = "age",
         prediction_cols = "gaussian_predictions",
         id_col = "participant",
         type = "gaussian")

# Binomial ID evaluation
evaluate(data = data, target_col = "diagnosis",
         prediction_cols = "binomial_predictions",
         id_col = "participant",
         id_method = "mean", # alternatively: "majority"
         type = "binomial")

# Multinomial ID evaluation

# Add IDs and new targets (must be constant within IDs)
data_mc[["target"]] <- NULL
data_mc[["id"]] <- rep(1:6, each = 5)
id_classes <- tibble::tibble(
    "id" = 1:6,
    target = sample(x = class_names, size = 6, replace = TRUE)
)
data_mc <- data_mc %>%
    dplyr::left_join(id_classes, by = "id")

# Perform ID evaluation
evaluate(data = data_mc, target_col = "target",
         prediction_cols = class_names,
         id_col = "id",
         id_method = "mean", # alternatively: "majority"
         type = "multinomial")

# Training and evaluating a multinomial model with nnet

# Create a data frame with some predictors and a target column
class_names <- paste0("class_", 1:4)
data_for_nnet <- multiclass_probability_tibble(
    num_classes = 3, # Here, number of predictors
    num_observations = 30,
    apply_softmax = FALSE,
    FUN = rnorm,
    class_name = "predictor_") %>%
    dplyr::mutate(class = sample(
        class_names,
        size = 30,
        replace = TRUE))

# Train multinomial model using the nnet package
mn_model <- nnet::multinom(
    "class ~ predictor_1 + predictor_2 + predictor_3",
    data = data_for_nnet)

# Predict the targets in the dataset
# (we would usually use a test set instead)
predictions <- predict(mn_model, data_for_nnet,
                       type = "probs") %>%
    dplyr::as_tibble()

# Add the targets
predictions[["target"]] <- data_for_nnet[["class"]]

# Evaluate predictions
evaluate(data = predictions, target_col = "target",
         prediction_cols = class_names,
         type = "multinomial")

LudvigOlsen/cvms documentation built on Dec. 9, 2019, 6:02 p.m.