example-interfaces.md
In tidyposterior: Bayesian Analysis to Compare Models using Resampling Statistics

' ## Input formats

'

' There are several ways to give resapling results to the `perf_mod()` function. To #' illustrate, here are some example objects using 10-fold cross-validation for a #' simple two-class problem:

'

' ```r

' library(tidymodels)

' library(tidyposterior)

' library(workflowsets)

'

' data(two_class_dat, package = "modeldata")

'

' set.seed(100)

' folds <- vfold_cv(two_class_dat)

' ```

'

' We can define two different models (for simplicity, with no tuning parameters).

'

' ```r

' logistic_reg_glm_spec <-

' logistic_reg() %>%

' set_engine('glm')

'

' mars_earth_spec <-

' mars(prod_degree = 1) %>%

' set_engine('earth') %>%

' set_mode('classification')

' ```

'

' For tidymodels, the [tune::fit_resamples()] function can be used to estimate #' performance for each model/resample:

'

' ```r

' rs_ctrl <- control_resamples(save_workflow = TRUE)

'

' logistic_reg_glm_res <-

' logistic_reg_glm_spec %>%

' fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl)

'

' mars_earth_res <-

' mars_earth_spec %>%

' fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl)

' ```

'

' From these, there are several ways to pass the results to `perf_mod()`.

'

' ### Data Frame as Input

'

' The most general approach is to have a data frame with the resampling labels (i.e., #' one or more id columns) as well as columns for each model that you would like to #' compare.

'

' For the model results above, [tune::collect_metrics()] can be used along with some #' basic data manipulation steps:

'

' ```r

' logistic_roc <-

' collect_metrics(logistic_reg_glm_res, summarize = FALSE) %>%

' dplyr::filter(.metric == "roc_auc") %>%

' dplyr::select(id, logistic = .estimate)

'

' mars_roc <-

' collect_metrics(mars_earth_res, summarize = FALSE) %>%

' dplyr::filter(.metric == "roc_auc") %>%

' dplyr::select(id, mars = .estimate)

'

' resamples_df <- full_join(logistic_roc, mars_roc, by = "id")

' resamples_df

' ```

'

' ```

' ## # A tibble: 10 x 3

' ## id logistic mars

' ##

' ## 1 Fold01 0.908 0.875

' ## 2 Fold02 0.904 0.917

' ## 3 Fold03 0.924 0.938

' ## 4 Fold04 0.881 0.881

' ## 5 Fold05 0.863 0.864

' ## 6 Fold06 0.893 0.889

' ## # … with 4 more rows

' ```

'

' We can then give this directly to `perf_mod()`:

'

' ```r

' set.seed(101)

' roc_model_via_df <- perf_mod(resamples_df, refresh = 0)

' tidy(roc_model_via_df) %>% summary()

' ```

'

' ```

' ## # A tibble: 2 x 4

' ## model mean lower upper

' ##

' ## 1 logistic 0.892 0.879 0.906

' ## 2 mars 0.888 0.875 0.902

' ```

'

' ### rsample Object as Input

'

' Alternatively, the result columns can be merged back into the original `rsample` #' object. The up-side to using this method is that `perf_mod()` will know exactly #' which model formula to use for the Bayesian model:

'

' ```r

' resamples_rset <-

' full_join(folds, logistic_roc, by = "id") %>%

' full_join(mars_roc, by = "id")

'

' set.seed(101)

' roc_model_via_rset <- perf_mod(resamples_rset, refresh = 0)

' tidy(roc_model_via_rset) %>% summary()

' ```

'

' ```

' ## # A tibble: 2 x 4

' ## model mean lower upper

' ##

' ## 1 logistic 0.892 0.879 0.906

' ## 2 mars 0.888 0.875 0.902

' ```

'

' ### Workflow Set Object as Input

'

' Finally, for tidymodels, a workflow set object can be used. This is a collection of #' models/preprocessing combinations in one object. We can emulate a workflow set using #' the existing example results then pass that to `perf_mod()`:

'

' ```r

' example_wset <-

' as_workflow_set(logistic = logistic_reg_glm_res, mars = mars_earth_res)

'

' set.seed(101)

' roc_model_via_wflowset <- perf_mod(example_wset, refresh = 0)

' tidy(roc_model_via_rset) %>% summary()

' ```

'

' ```

' ## # A tibble: 2 x 4

' ## model mean lower upper

' ##

' ## 1 logistic 0.892 0.879 0.906

' ## 2 mars 0.888 0.875 0.902

' ```

'

' ### caret resamples object

'

' The `caret` package can also be used. An equivalent set of models are created:

'

' ```r

' library(caret)

'

' set.seed(102)

' logistic_caret <- train(Class ~ ., data = two_class_dat, method = "glm",

' trControl = trainControl(method = "cv"))

'

' set.seed(102)

' mars_caret <- train(Class ~ ., data = two_class_dat, method = "gcvEarth",

' tuneGrid = data.frame(degree = 1),

' trControl = trainControl(method = "cv"))

' ```

'

' Note that these two models use the same resamples as one another due to setting the #' seed prior to calling `train()`. However, these are different from the tidymodels #' results used above (so the final results will be different).

'

' `caret` has a `resamples()` function that can collect and collate the resamples. #' This can also be given to `perf_mod()`:

'

' ```r

' caret_resamples <- resamples(list(logistic = logistic_caret, mars = mars_caret))

'

' set.seed(101)

' roc_model_via_caret <- perf_mod(caret_resamples, refresh = 0)

' tidy(roc_model_via_caret) %>% summary()

' ```

'

' ```

' ## # A tibble: 2 x 4

' ## model mean lower upper

' ##

' ## 1 logistic 0.821 0.801 0.842

' ## 2 mars 0.822 0.802 0.842

' ```

'

Any scripts or data that you put into this service are public.

tidyposterior documentation built on Oct. 12, 2023, 1:07 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

tidyposterior Bayesian Analysis to Compare Models using Resampling Statistics

man/rmd/example-interfaces.md In tidyposterior: Bayesian Analysis to Compare Models using Resampling Statistics

' ## Input formats

'

' There are several ways to give resapling results to the perf_mod() function. To #' illustrate, here are some example objects using 10-fold cross-validation for a #' simple two-class problem:

'

'

' ```r

' library(tidymodels)

' library(tidyposterior)

' library(workflowsets)

'

' data(two_class_dat, package = "modeldata")

'

' set.seed(100)

' folds <- vfold_cv(two_class_dat)

' ```

'

' We can define two different models (for simplicity, with no tuning parameters).

'

'

' ```r

' logistic_reg_glm_spec <-

' logistic_reg() %>%

' set_engine('glm')

'

' mars_earth_spec <-

' mars(prod_degree = 1) %>%

' set_engine('earth') %>%

' set_mode('classification')

' ```

'

' For tidymodels, the [tune::fit_resamples()] function can be used to estimate #' performance for each model/resample:

'

'

' ```r

' rs_ctrl <- control_resamples(save_workflow = TRUE)

'

' logistic_reg_glm_res <-

' logistic_reg_glm_spec %>%

' fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl)

'

' mars_earth_res <-

' mars_earth_spec %>%

' fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl)

' ```

'

' From these, there are several ways to pass the results to perf_mod().

'

' ### Data Frame as Input

'

' The most general approach is to have a data frame with the resampling labels (i.e., #' one or more id columns) as well as columns for each model that you would like to #' compare.

'

' For the model results above, [tune::collect_metrics()] can be used along with some #' basic data manipulation steps:

'

'

' ```r

' logistic_roc <-

' collect_metrics(logistic_reg_glm_res, summarize = FALSE) %>%

' dplyr::filter(.metric == "roc_auc") %>%

' dplyr::select(id, logistic = .estimate)

'

' mars_roc <-

' collect_metrics(mars_earth_res, summarize = FALSE) %>%

' dplyr::filter(.metric == "roc_auc") %>%

' dplyr::select(id, mars = .estimate)

'

' resamples_df <- full_join(logistic_roc, mars_roc, by = "id")

' resamples_df

' ```

'

' ```

' ## # A tibble: 10 x 3

' ## id logistic mars

' ##

' ## 1 Fold01 0.908 0.875

' ## 2 Fold02 0.904 0.917

' ## 3 Fold03 0.924 0.938

' ## 4 Fold04 0.881 0.881

' ## 5 Fold05 0.863 0.864

tidyposterior
Bayesian Analysis to Compare Models using Resampling Statistics

man/rmd/example-interfaces.md
In tidyposterior: Bayesian Analysis to Compare Models using Resampling Statistics

' There are several ways to give resapling results to the `perf_mod()` function. To #' illustrate, here are some example objects using 10-fold cross-validation for a #' simple two-class problem:

' From these, there are several ways to pass the results to `perf_mod()`.

' We can then give this directly to `perf_mod()`:

' Alternatively, the result columns can be merged back into the original `rsample` #' object. The up-side to using this method is that `perf_mod()` will know exactly #' which model formula to use for the Bayesian model:

' Finally, for tidymodels, a workflow set object can be used. This is a collection of #' models/preprocessing combinations in one object. We can emulate a workflow set using #' the existing example results then pass that to `perf_mod()`:

' The `caret` package can also be used. An equivalent set of models are created: