View source: R/prepare_performance_data.R
prepare_performance_data | R Documentation |
The prepare_performance_data function makes a Performance Data that is made of different cutoffs. Each row represents a cutoff and each column stands for a performance metric. It is possible to use this function for more than one model in order to compare different models performance for the same population. In this case the user should use a list that is made of vectors of estimated probabilities, one for each model.
prepare_performance_data(
probs,
reals,
by = 0.01,
stratified_by = "probability_threshold"
)
probs |
a list of vectors of estimated probabilities (one for each model or one for each population) |
reals |
a list of vectors of binary outcomes (one for each population) |
by |
number: increment of the sequence. |
stratified_by |
Performance Metrics can be stratified by Probability Threshold or alternatively by Predicted Positives Condition Rate |
Sometime instead of using a cutoff for the estimated probability it is required to enforce a symmetry between the percentiles of the probabilities, in medicine it is referred as "Risk Percentile" when the outcome stands for something negative in essence such as a severe disease or death: Let's say that we want to see the model performance for the top 5% patients at risk for some well defined population, in this case the user should change the parameter stratified_by from the default "probability_threshold" to "predicted_positives" and the results will be similar Performance Data, only this time each row will represent some rounded percentile.
# You can prepare Performance Data for one model
prepare_performance_data(
probs = list(example_dat$estimated_probabilities),
reals = list(example_dat$outcome)
)
prepare_performance_data(
probs = list(example_dat$estimated_probabilities),
reals = list(example_dat$outcome),
stratified_by = "ppcr"
)
# Several Models
prepare_performance_data(
probs = list(
"First Model" = example_dat$estimated_probabilities,
"Second Model" = example_dat$random_guess
),
reals = list(example_dat$outcome)
)
prepare_performance_data(
probs = list(
"First Model" = example_dat$estimated_probabilities,
"Second Model" = example_dat$random_guess
),
reals = list(example_dat$outcome),
stratified_by = "ppcr"
)
# Several Populations
prepare_performance_data(
probs = list(
"train" = example_dat %>%
dplyr::filter(type_of_set == "train") %>%
dplyr::pull(estimated_probabilities),
"test" = example_dat %>% dplyr::filter(type_of_set == "test") %>%
dplyr::pull(estimated_probabilities)
),
reals = list(
"train" = example_dat %>% dplyr::filter(type_of_set == "train") %>%
dplyr::pull(outcome),
"test" = example_dat %>% dplyr::filter(type_of_set == "test") %>%
dplyr::pull(outcome)
)
)
prepare_performance_data(
probs = list(
"train" = example_dat %>%
dplyr::filter(type_of_set == "train") %>%
dplyr::pull(estimated_probabilities),
"test" = example_dat %>% dplyr::filter(type_of_set == "test") %>%
dplyr::pull(estimated_probabilities)
),
reals = list(
"train" = example_dat %>% dplyr::filter(type_of_set == "train") %>%
dplyr::pull(outcome),
"test" = example_dat %>% dplyr::filter(type_of_set == "test") %>%
dplyr::pull(outcome)
),
stratified_by = "ppcr"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.