View source: R/most_challenging.R
most_challenging | R Documentation |
Finds the data points that, overall, were the most challenging to predict, based on a prediction metric.
most_challenging(
data,
type,
obs_id_col = "Observation",
target_col = "Target",
prediction_cols = ifelse(type == "gaussian", "Prediction", "Predicted Class"),
threshold = 0.15,
threshold_is = "percentage",
metric = NULL,
cutoff = 0.5
)
data |
Predictions can be passed as values, predicted classes or predicted probabilities: N.B. Adds MultinomialWhen Probabilities (Preferable)One column per class with the probability of that class. The columns should have the name of their class, as they are named in the target column. E.g.:
ClassesA single column of type
BinomialWhen Probabilities (Preferable)One column with the probability of class being the second class alphabetically ("dog" if classes are "cat" and "dog"). E.g.:
Note: At the alphabetical ordering of the class labels, they are of type ClassesA single column of type
GaussianWhen
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
type |
Type of task used to get the predictions:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
obs_id_col |
Name of column with observation IDs. This will be used to aggregate the performance of each observation. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
target_col |
Name of column with the true classes/values in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
prediction_cols |
Name(s) of column(s) with the predictions. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
threshold |
Threshold to filter observations by. Depends on The Gaussianthreshold_is "percentage"(Approximate) percentage of the observations with the largest root mean square errors to return. threshold_is "score"Observations with a root mean square error larger than or equal to the Binomial, Multinomialthreshold_is "percentage"(Approximate) percentage of the observations to return with:
threshold_is "score"
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
threshold_is |
Either | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
metric |
The metric to use. If Binomial, Multinomial
When one prediction column with predicted classes is passed,
the default is When one or more prediction columns with predicted probabilities are passed,
the default is GaussianIgnored. Always uses | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
cutoff |
Threshold for predicted classes. (Numeric) N.B. Binomial only. |
data.frame
with the most challenging observations and their metrics.
`>=` / `<=`
denotes the threshold as score.
Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk
# Attach packages
library(cvms)
library(dplyr)
##
## Multinomial
##
# Find the most challenging data points (per classifier)
# in the predicted.musicians dataset
# which resembles the "Predictions" tibble from the evaluation results
# Passing predicted probabilities
# Observations with 30% highest MAE scores
most_challenging(
predicted.musicians,
obs_id_col = "ID",
prediction_cols = c("A", "B", "C", "D"),
type = "multinomial",
threshold = 0.30
)
# Observations with 25% highest Cross Entropy scores
most_challenging(
predicted.musicians,
obs_id_col = "ID",
prediction_cols = c("A", "B", "C", "D"),
type = "multinomial",
threshold = 0.25,
metric = "Cross Entropy"
)
# Passing predicted classes
# Observations with 30% lowest Accuracy scores
most_challenging(
predicted.musicians,
obs_id_col = "ID",
prediction_cols = "Predicted Class",
type = "multinomial",
threshold = 0.30
)
# The 40% lowest-scoring on accuracy per classifier
predicted.musicians %>%
dplyr::group_by(Classifier) %>%
most_challenging(
obs_id_col = "ID",
prediction_cols = "Predicted Class",
type = "multinomial",
threshold = 0.40
)
# Accuracy scores below 0.05
most_challenging(
predicted.musicians,
obs_id_col = "ID",
type = "multinomial",
threshold = 0.05,
threshold_is = "score"
)
##
## Binomial
##
# Subset the predicted.musicians
binom_data <- predicted.musicians %>%
dplyr::filter(Target %in% c("A","B")) %>%
dplyr::rename(Prediction = B)
# Passing probabilities
# Observations with 30% highest MAE
most_challenging(
binom_data,
obs_id_col = "ID",
type = "binomial",
prediction_cols = "Prediction",
threshold = 0.30
)
# Observations with 30% highest Cross Entropy
most_challenging(
binom_data,
obs_id_col = "ID",
type = "binomial",
prediction_cols = "Prediction",
threshold = 0.30,
metric = "Cross Entropy"
)
# Passing predicted classes
# Observations with 30% lowest Accuracy scores
most_challenging(
binom_data,
obs_id_col = "ID",
type = "binomial",
prediction_cols = "Predicted Class",
threshold = 0.30
)
##
## Gaussian
##
set.seed(1)
df <- data.frame(
"Observation" = rep(1:10, n = 3),
"Target" = rnorm(n = 30, mean = 25, sd = 5),
"Prediction" = rnorm(n = 30, mean = 27, sd = 7)
)
# The 20% highest RMSE scores
most_challenging(
df,
type = "gaussian",
threshold = 0.2
)
# RMSE scores above 9
most_challenging(
df,
type = "gaussian",
threshold = 9,
threshold_is = "score"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.