evaluate: Evaluate your model's performance

View source: R/evaluate.R

evaluateR Documentation

Evaluate your model's performance

Description

\Sexpr[results=rd, stage=render]{lifecycle::badge("maturing")}

Evaluate your model's predictions on a set of evaluation metrics.

Create ID-aggregated evaluations by multiple methods.

Currently supports regression and classification (binary and multiclass). See `type`.

Usage

evaluate(
  data,
  target_col,
  prediction_cols,
  type,
  id_col = NULL,
  id_method = "mean",
  apply_softmax = FALSE,
  cutoff = 0.5,
  positive = 2,
  metrics = list(),
  include_predictions = TRUE,
  parallel = FALSE,
  models = deprecated()
)

Arguments

data

data.frame with predictions, targets and (optionally) an ID column. Can be grouped with group_by.

Multinomial

When `type` is "multinomial", the predictions can be passed in one of two formats.

Probabilities (Preferable)

One column per class with the probability of that class. The columns should have the name of their class, as they are named in the target column. E.g.:

class_1 class_2 class_3 target
0.269 0.528 0.203 class_2
0.368 0.322 0.310 class_3
0.375 0.371 0.254 class_2
... ... ... ...
Classes

A single column of type character with the predicted classes. E.g.:

prediction target
class_2 class_2
class_1 class_3
class_1 class_2
... ...

Binomial

When `type` is "binomial", the predictions can be passed in one of two formats.

Probabilities (Preferable)

One column with the probability of class being the second class alphabetically (1 if classes are 0 and 1). E.g.:

prediction target
0.769 1
0.368 1
0.375 0
... ...

Note: At the alphabetical ordering of the class labels, they are of type character, why e.g. 100 would come before 7.

Classes

A single column of type character with the predicted classes. E.g.:

prediction target
class_0 class_1
class_1 class_1
class_1 class_0
... ...

Note: The prediction column will be converted to the probability 0.0 for the first class alphabetically and 1.0 for the second class alphabetically.

Gaussian

When `type` is "gaussian", the predictions should be passed as one column with the predicted values. E.g.:

prediction target
28.9 30.2
33.2 27.1
23.4 21.3
... ...
target_col

Name of the column with the true classes/values in `data`.

When `type` is "multinomial", this column should contain the class names, not their indices.

prediction_cols

Name(s) of column(s) with the predictions.

Columns can be either numeric or character depending on which format is chosen. See `data` for the possible formats.

type

Type of evaluation to perform:

"gaussian" for regression (like linear regression).

"binomial" for binary classification.

"multinomial" for multiclass classification.

id_col

Name of ID column to aggregate predictions by.

N.B. Current methods assume that the target class/value is constant within the IDs.

N.B. When aggregating by ID, some metrics may be disabled.

id_method

Method to use when aggregating predictions by ID. Either "mean" or "majority".

When `type` is gaussian, only the "mean" method is available.

mean

The average prediction (value or probability) is calculated per ID and evaluated. This method assumes that the target class/value is constant within the IDs.

majority

The most predicted class per ID is found and evaluated. In case of a tie, the winning classes share the probability (e.g. P = 0.5 each when two majority classes). This method assumes that the target class/value is constant within the IDs.

apply_softmax

Whether to apply the softmax function to the prediction columns when `type` is "multinomial".

N.B. Multinomial models only.

cutoff

Threshold for predicted classes. (Numeric)

N.B. Binomial models only.

positive

Level from dependent variable to predict. Either as character (preferable) or level index (1 or 2 - alphabetically).

E.g. if we have the levels "cat" and "dog" and we want "dog" to be the positive class, we can either provide "dog" or 2, as alphabetically, "dog" comes after "cat".

Note: For reproducibility, it's preferable to specify the name directly, as different locales may sort the levels differently.

Used when calculating confusion matrix metrics and creating ROC curves.

The Process column in the output can be used to verify this setting.

N.B. Only affects the evaluation metrics. Does NOT affect what the probabilities are of (always the second class alphabetically).

N.B. Binomial models only.

metrics

list for enabling/disabling metrics.

E.g. list("RMSE" = FALSE) would remove RMSE from the regression results, and list("Accuracy" = TRUE) would add the regular Accuracy metric to the classification results. Default values (TRUE/FALSE) will be used for the remaining available metrics.

You can enable/disable all metrics at once by including "all" = TRUE/FALSE in the list. This is done prior to enabling/disabling individual metrics, why f.i. list("all" = FALSE, "RMSE" = TRUE) would return only the RMSE metric.

The list can be created with gaussian_metrics(), binomial_metrics(), or multinomial_metrics().

Also accepts the string "all".

include_predictions

Whether to include the predictions in the output as a nested tibble. (Logical)

parallel

Whether to run evaluations in parallel, when `data` is grouped with group_by.

models

Deprecated.

Details

Packages used:

Binomial and Multinomial:

ROC and AUC:

Binomial: pROC::roc

Multinomial: pROC::multiclass.roc

Value

—————————————————————-

Gaussian Results

—————————————————————-

tibble containing the following metrics by default:

Average RMSE, MAE, NRMSE(IQR), RRSE, RAE, RMSLE.

See the additional metrics (disabled by default) at ?gaussian_metrics.

Also includes:

A nested tibble with the Predictions and targets.

A nested Process information object with information about the evaluation.

—————————————————————-

Binomial Results

—————————————————————-

tibble with the following evaluation metrics, based on a confusion matrix and a ROC curve fitted to the predictions:

Confusion Matrix:

Balanced Accuracy, Accuracy, F1, Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value, Kappa, Detection Rate, Detection Prevalence, Prevalence, and MCC (Matthews correlation coefficient).

ROC:

AUC, Lower CI, and Upper CI

Note, that the ROC curve is only computed if AUC is enabled. See metrics.

Also includes:

A nested tibble with the predictions and targets.

A list of ROC curve objects (if computed).

A nested tibble with the confusion matrix. The Pos_ columns tells you whether a row is a True Positive (TP), True Negative (TN), False Positive (FP), or False Negative (FN), depending on which level is the "positive" class. I.e. the level you wish to predict.

A nested Process information object with information about the evaluation.

—————————————————————-

Multinomial Results

—————————————————————-

For each class, a one-vs-all binomial evaluation is performed. This creates a Class Level Results tibble containing the same metrics as the binomial results described above (excluding Accuracy, MCC, AUC, Lower CI and Upper CI), along with a count of the class in the target column (Support). These metrics are used to calculate the macro metrics. The nested class level results tibble is also included in the output tibble, and could be reported along with the macro and overall metrics.

The output tibble contains the macro and overall metrics. The metrics that share their name with the metrics in the nested class level results tibble are averages of those metrics (note: does not remove NAs before averaging). In addition to these, it also includes the Overall Accuracy and the multiclass MCC.

Other available metrics (disabled by default, see metrics): Accuracy, multiclass AUC, Weighted Balanced Accuracy, Weighted Accuracy, Weighted F1, Weighted Sensitivity, Weighted Sensitivity, Weighted Specificity, Weighted Pos Pred Value, Weighted Neg Pred Value, Weighted Kappa, Weighted Detection Rate, Weighted Detection Prevalence, and Weighted Prevalence.

Note that the "Weighted" average metrics are weighted by the Support.

When having a large set of classes, consider keeping AUC disabled.

Also includes:

A nested tibble with the Predictions and targets.

A list of ROC curve objects when AUC is enabled.

A nested tibble with the multiclass Confusion Matrix.

A nested Process information object with information about the evaluation.

Class Level Results

Besides the binomial evaluation metrics and the Support, the nested class level results tibble also contains a nested tibble with the Confusion Matrix from the one-vs-all evaluation. The Pos_ columns tells you whether a row is a True Positive (TP), True Negative (TN), False Positive (FP), or False Negative (FN), depending on which level is the "positive" class. In our case, 1 is the current class and 0 represents all the other classes together.

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

See Also

Other evaluation functions: binomial_metrics(), confusion_matrix(), evaluate_residuals(), gaussian_metrics(), multinomial_metrics()

Examples


# Attach packages
library(cvms)
library(dplyr)

# Load data
data <- participant.scores

# Fit models
gaussian_model <- lm(age ~ diagnosis, data = data)
binomial_model <- glm(diagnosis ~ score, data = data)

# Add predictions
data[["gaussian_predictions"]] <- predict(gaussian_model, data,
  type = "response",
  allow.new.levels = TRUE
)
data[["binomial_predictions"]] <- predict(binomial_model, data,
  allow.new.levels = TRUE
)

# Gaussian evaluation
evaluate(
  data = data, target_col = "age",
  prediction_cols = "gaussian_predictions",
  type = "gaussian"
)

# Binomial evaluation
evaluate(
  data = data, target_col = "diagnosis",
  prediction_cols = "binomial_predictions",
  type = "binomial"
)

#
# Multinomial
#

# Create a tibble with predicted probabilities and targets
data_mc <- multiclass_probability_tibble(
  num_classes = 3, num_observations = 45,
  apply_softmax = TRUE, FUN = runif,
  class_name = "class_",
  add_targets = TRUE
)

class_names <- paste0("class_", 1:3)

# Multinomial evaluation
evaluate(
  data = data_mc, target_col = "Target",
  prediction_cols = class_names,
  type = "multinomial"
)

#
# ID evaluation
#

# Gaussian ID evaluation
# Note that 'age' is the same for all observations
# of a participant
evaluate(
  data = data, target_col = "age",
  prediction_cols = "gaussian_predictions",
  id_col = "participant",
  type = "gaussian"
)

# Binomial ID evaluation
evaluate(
  data = data, target_col = "diagnosis",
  prediction_cols = "binomial_predictions",
  id_col = "participant",
  id_method = "mean", # alternatively: "majority"
  type = "binomial"
)

# Multinomial ID evaluation

# Add IDs and new targets (must be constant within IDs)
data_mc[["Target"]] <- NULL
data_mc[["ID"]] <- rep(1:9, each = 5)
id_classes <- tibble::tibble(
  "ID" = 1:9,
  "Target" = sample(x = class_names, size = 9, replace = TRUE)
)
data_mc <- data_mc %>%
  dplyr::left_join(id_classes, by = "ID")

# Perform ID evaluation
evaluate(
  data = data_mc, target_col = "Target",
  prediction_cols = class_names,
  id_col = "ID",
  id_method = "mean", # alternatively: "majority"
  type = "multinomial"
)

#
# Training and evaluating a multinomial model with nnet
#

# Only run if `nnet` is installed
if (requireNamespace("nnet", quietly = TRUE)){

# Create a data frame with some predictors and a target column
class_names <- paste0("class_", 1:4)
data_for_nnet <- multiclass_probability_tibble(
  num_classes = 3, # Here, number of predictors
  num_observations = 30,
  apply_softmax = FALSE,
  FUN = rnorm,
  class_name = "predictor_"
) %>%
  dplyr::mutate(Target = sample(
    class_names,
    size = 30,
    replace = TRUE
  ))

# Train multinomial model using the nnet package
mn_model <- nnet::multinom(
  "Target ~ predictor_1 + predictor_2 + predictor_3",
  data = data_for_nnet
)

# Predict the targets in the dataset
# (we would usually use a test set instead)
predictions <- predict(
  mn_model,
  data_for_nnet,
  type = "probs"
) %>%
  dplyr::as_tibble()

# Add the targets
predictions[["Target"]] <- data_for_nnet[["Target"]]

# Evaluate predictions
evaluate(
  data = predictions,
  target_col = "Target",
  prediction_cols = class_names,
  type = "multinomial"
)
}


cvms documentation built on July 9, 2023, 6:56 p.m.