evaluate | R Documentation |
Evaluate your model's predictions on a set of evaluation metrics.
Create ID-aggregated evaluations by multiple methods.
Currently supports regression and classification
(binary and multiclass). See `type`
.
evaluate(
data,
target_col,
prediction_cols,
type,
id_col = NULL,
id_method = "mean",
apply_softmax = FALSE,
cutoff = 0.5,
positive = 2,
metrics = list(),
include_predictions = TRUE,
parallel = FALSE,
models = deprecated()
)
data |
MultinomialWhen Probabilities (Preferable)One column per class with the probability of that class. The columns should have the name of their class, as they are named in the target column. E.g.:
ClassesA single column of type
BinomialWhen Probabilities (Preferable)One column with the probability of class being the second class alphabetically (1 if classes are 0 and 1). E.g.:
Note: At the alphabetical ordering of the class labels, they are of type ClassesA single column of type
Note: The prediction column will be converted to the probability GaussianWhen
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
target_col |
Name of the column with the true classes/values in When | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
prediction_cols |
Name(s) of column(s) with the predictions. Columns can be either numeric or character depending on which format is chosen.
See | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
type |
Type of evaluation to perform:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
id_col |
Name of ID column to aggregate predictions by. N.B. Current methods assume that the target class/value is constant within the IDs. N.B. When aggregating by ID, some metrics may be disabled. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
id_method |
Method to use when aggregating predictions by ID.
Either When meanThe average prediction (value or probability) is calculated per ID and evaluated. This method assumes that the target class/value is constant within the IDs. majorityThe most predicted class per ID is found and evaluated. In case of a tie,
the winning classes share the probability (e.g. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
apply_softmax |
Whether to apply the softmax function to the
prediction columns when N.B. Multinomial models only. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
cutoff |
Threshold for predicted classes. (Numeric) N.B. Binomial models only. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
positive |
Level from dependent variable to predict.
Either as character (preferable) or level index ( E.g. if we have the levels Note: For reproducibility, it's preferable to specify the name directly, as
different Used when calculating confusion matrix metrics and creating The N.B. Only affects the evaluation metrics. Does NOT affect what the probabilities are of (always the second class alphabetically). N.B. Binomial models only. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
metrics |
E.g. You can enable/disable all metrics at once by including
The Also accepts the string | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
include_predictions |
Whether to include the predictions
in the output as a nested | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
parallel |
Whether to run evaluations in parallel,
when | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
models |
Deprecated. |
Packages used:
Binomial and Multinomial:
ROC
and AUC
:
Binomial: pROC::roc
Multinomial: pROC::multiclass.roc
—————————————————————-
—————————————————————-
tibble
containing the following metrics by default:
Average RMSE
, MAE
, NRMSE(IQR)
,
RRSE
, RAE
, RMSLE
.
See the additional metrics (disabled by default) at
?gaussian_metrics
.
Also includes:
A nested tibble
with the Predictions and targets.
A nested Process information object with information about the evaluation.
—————————————————————-
—————————————————————-
tibble
with the following evaluation metrics, based on a
confusion matrix
and a ROC
curve fitted to the predictions:
Confusion Matrix
:
Balanced Accuracy
,
Accuracy
,
F1
,
Sensitivity
,
Specificity
,
Positive Predictive Value
,
Negative Predictive Value
,
Kappa
,
Detection Rate
,
Detection Prevalence
,
Prevalence
, and
MCC
(Matthews correlation coefficient).
ROC
:
AUC
, Lower CI
, and Upper CI
Note, that the ROC
curve is only computed if AUC
is enabled. See metrics
.
Also includes:
A nested tibble
with the predictions and targets.
A list
of ROC curve objects (if computed).
A nested tibble
with the confusion matrix.
The Pos_
columns tells you whether a row is a
True Positive (TP
), True Negative (TN
),
False Positive (FP
), or False Negative (FN
),
depending on which level is the "positive
" class.
I.e. the level you wish to predict.
A nested Process information object with information about the evaluation.
—————————————————————-
—————————————————————-
For each class, a one-vs-all binomial evaluation is performed. This creates
a Class Level Results tibble
containing the same metrics as the binomial results
described above (excluding Accuracy
, MCC
, AUC
, Lower CI
and Upper CI
),
along with a count of the class in the target column (Support
).
These metrics are used to calculate the macro-averaged metrics.
The nested class level results tibble
is also included in the output tibble
,
and could be reported along with the macro and overall metrics.
The output tibble
contains the macro and overall metrics.
The metrics that share their name with the metrics in the nested
class level results tibble
are averages of those metrics
(note: does not remove NA
s before averaging).
In addition to these, it also includes the Overall Accuracy
and
the multiclass MCC
.
Note: Balanced Accuracy
is the macro-averaged metric,
not the macro sensitivity as sometimes used!
Other available metrics (disabled by default, see metrics
):
Accuracy
,
multiclass AUC
,
Weighted Balanced Accuracy
,
Weighted Accuracy
,
Weighted F1
,
Weighted Sensitivity
,
Weighted Sensitivity
,
Weighted Specificity
,
Weighted Pos Pred Value
,
Weighted Neg Pred Value
,
Weighted Kappa
,
Weighted Detection Rate
,
Weighted Detection Prevalence
, and
Weighted Prevalence
.
Note that the "Weighted" average metrics are weighted by the Support
.
When having a large set of classes, consider keeping AUC
disabled.
Also includes:
A nested tibble
with the Predictions and targets.
A list
of ROC curve objects when AUC
is enabled.
A nested tibble
with the multiclass Confusion Matrix.
A nested Process information object with information about the evaluation.
Besides the binomial evaluation metrics and the Support
,
the nested class level results tibble
also contains a
nested tibble
with the Confusion Matrix from the one-vs-all evaluation.
The Pos_
columns tells you whether a row is a
True Positive (TP
), True Negative (TN
),
False Positive (FP
), or False Negative (FN
),
depending on which level is the "positive" class. In our case, 1
is the current class
and 0
represents all the other classes together.
Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk
Other evaluation functions:
binomial_metrics()
,
confusion_matrix()
,
evaluate_residuals()
,
gaussian_metrics()
,
multinomial_metrics()
# Attach packages
library(cvms)
library(dplyr)
# Load data
data <- participant.scores
# Fit models
gaussian_model <- lm(age ~ diagnosis, data = data)
binomial_model <- glm(diagnosis ~ score, data = data)
# Add predictions
data[["gaussian_predictions"]] <- predict(gaussian_model, data,
type = "response",
allow.new.levels = TRUE
)
data[["binomial_predictions"]] <- predict(binomial_model, data,
allow.new.levels = TRUE
)
# Gaussian evaluation
evaluate(
data = data, target_col = "age",
prediction_cols = "gaussian_predictions",
type = "gaussian"
)
# Binomial evaluation
evaluate(
data = data, target_col = "diagnosis",
prediction_cols = "binomial_predictions",
type = "binomial"
)
#
# Multinomial
#
# Create a tibble with predicted probabilities and targets
data_mc <- multiclass_probability_tibble(
num_classes = 3, num_observations = 45,
apply_softmax = TRUE, FUN = runif,
class_name = "class_",
add_targets = TRUE
)
class_names <- paste0("class_", 1:3)
# Multinomial evaluation
evaluate(
data = data_mc, target_col = "Target",
prediction_cols = class_names,
type = "multinomial"
)
#
# ID evaluation
#
# Gaussian ID evaluation
# Note that 'age' is the same for all observations
# of a participant
evaluate(
data = data, target_col = "age",
prediction_cols = "gaussian_predictions",
id_col = "participant",
type = "gaussian"
)
# Binomial ID evaluation
evaluate(
data = data, target_col = "diagnosis",
prediction_cols = "binomial_predictions",
id_col = "participant",
id_method = "mean", # alternatively: "majority"
type = "binomial"
)
# Multinomial ID evaluation
# Add IDs and new targets (must be constant within IDs)
data_mc[["Target"]] <- NULL
data_mc[["ID"]] <- rep(1:9, each = 5)
id_classes <- tibble::tibble(
"ID" = 1:9,
"Target" = sample(x = class_names, size = 9, replace = TRUE)
)
data_mc <- data_mc %>%
dplyr::left_join(id_classes, by = "ID")
# Perform ID evaluation
evaluate(
data = data_mc, target_col = "Target",
prediction_cols = class_names,
id_col = "ID",
id_method = "mean", # alternatively: "majority"
type = "multinomial"
)
#
# Training and evaluating a multinomial model with nnet
#
# Only run if `nnet` is installed
if (requireNamespace("nnet", quietly = TRUE)){
# Create a data frame with some predictors and a target column
class_names <- paste0("class_", 1:4)
data_for_nnet <- multiclass_probability_tibble(
num_classes = 3, # Here, number of predictors
num_observations = 30,
apply_softmax = FALSE,
FUN = rnorm,
class_name = "predictor_"
) %>%
dplyr::mutate(Target = sample(
class_names,
size = 30,
replace = TRUE
))
# Train multinomial model using the nnet package
mn_model <- nnet::multinom(
"Target ~ predictor_1 + predictor_2 + predictor_3",
data = data_for_nnet
)
# Predict the targets in the dataset
# (we would usually use a test set instead)
predictions <- predict(
mn_model,
data_for_nnet,
type = "probs"
) %>%
dplyr::as_tibble()
# Add the targets
predictions[["Target"]] <- data_for_nnet[["Target"]]
# Evaluate predictions
evaluate(
data = predictions,
target_col = "Target",
prediction_cols = class_names,
type = "multinomial"
)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.