prepare_performance_data: Prepare Performance Data
In uriahf/rtichoke: rtichoke

View source: R/prepare_performance_data.R

prepare_performance_data

R Documentation

Prepare Performance Data

Description

The prepare_performance_data function makes a Performance Data that is made of different cutoffs. Each row represents a cutoff and each column stands for a performance metric. It is possible to use this function for more than one model in order to compare different models performance for the same population. In this case the user should use a list that is made of vectors of estimated probabilities, one for each model.

Usage

prepare_performance_data(
  probs,
  reals,
  by = 0.01,
  stratified_by = "probability_threshold"
)

Arguments

`probs`	a list of vectors of estimated probabilities (one for each model or one for each population)
`reals`	a list of vectors of binary outcomes (one for each population)
`by`	number: increment of the sequence.
`stratified_by`	Performance Metrics can be stratified by Probability Threshold or alternatively by Predicted Positives Condition Rate

Details

Sometime instead of using a cutoff for the estimated probability it is required to enforce a symmetry between the percentiles of the probabilities, in medicine it is referred as "Risk Percentile" when the outcome stands for something negative in essence such as a severe disease or death: Let's say that we want to see the model performance for the top 5% patients at risk for some well defined population, in this case the user should change the parameter stratified_by from the default "probability_threshold" to "predicted_positives" and the results will be similar Performance Data, only this time each row will represent some rounded percentile.

Examples

# You can prepare Performance Data for one model

prepare_performance_data(
  probs = list(example_dat$estimated_probabilities),
  reals = list(example_dat$outcome)
)

prepare_performance_data(
  probs = list(example_dat$estimated_probabilities),
  reals = list(example_dat$outcome),
  stratified_by = "ppcr"
)

# Several Models

prepare_performance_data(
  probs = list(
    "First Model" = example_dat$estimated_probabilities,
    "Second Model" = example_dat$random_guess
  ),
  reals = list(example_dat$outcome)
)

prepare_performance_data(
  probs = list(
    "First Model" = example_dat$estimated_probabilities,
    "Second Model" = example_dat$random_guess
  ),
  reals = list(example_dat$outcome),
  stratified_by = "ppcr"
)


# Several Populations

prepare_performance_data(
  probs = list(
    "train" = example_dat %>%
      dplyr::filter(type_of_set == "train") %>%
      dplyr::pull(estimated_probabilities),
    "test" = example_dat %>% dplyr::filter(type_of_set == "test") %>%
      dplyr::pull(estimated_probabilities)
  ),
  reals = list(
    "train" = example_dat %>% dplyr::filter(type_of_set == "train") %>%
      dplyr::pull(outcome),
    "test" = example_dat %>% dplyr::filter(type_of_set == "test") %>%
      dplyr::pull(outcome)
  )
)

prepare_performance_data(
  probs = list(
    "train" = example_dat %>%
      dplyr::filter(type_of_set == "train") %>%
      dplyr::pull(estimated_probabilities),
    "test" = example_dat %>% dplyr::filter(type_of_set == "test") %>%
      dplyr::pull(estimated_probabilities)
  ),
  reals = list(
    "train" = example_dat %>% dplyr::filter(type_of_set == "train") %>%
      dplyr::pull(outcome),
    "test" = example_dat %>% dplyr::filter(type_of_set == "test") %>%
      dplyr::pull(outcome)
  ),
  stratified_by = "ppcr"
)

uriahf/rtichoke documentation built on Nov. 22, 2023, 1:30 a.m.