knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(bayesrules)
For Bayesian model evaluation, the bayesrules package has three functions prediction_summary()
,
classification_summary()
and naive_classification_summary()
as well as their cross-validation counterparts prediction_summary_cv()
, classification_summary_cv()
, and naive_classification_summary_cv()
respectively.
**Functions** | **Response** | **Model** |
---|---|---|
`prediction_summary()` `prediction_summary_cv()` |
Quantitative | rstanreg |
`classification_summary()` `classification_summary_cv()` |
Binary | rstanreg |
`naive_classification_summary()` `naive_classification_summary_cv()` |
Categorical | naiveBayes |
Given a set of observed data including a quantitative response variable y and an rstanreg model of y, prediction_summary()
returns 4 measures of the posterior prediction quality.
Median absolute prediction error (mae) measures the typical difference between the observed y values and their posterior predictive medians (stable = TRUE) or means (stable = FALSE).
Scaled mae (mae_scaled) measures the typical number of absolute deviations (stable = TRUE) or standard deviations (stable = FALSE) that observed y values fall from their predictive medians (stable = TRUE) or means (stable = FALSE).
and 4. within_50 and within_90 report the proportion of observed y values that fall within their posterior prediction intervals, the probability levels of which are set by the user.
Although 50% and 90% are the defaults for the posterior prediction intervals, these probability levels can be changed with prob_inner
and prob_outer
arguments.
The example below shows the 60% and 80% posterior prediction intervals.
# Data generation example_data <- data.frame(x = sample(1:100, 20)) example_data$y <- example_data$x*3 + rnorm(20, 0, 5) # rstanreg model example_model <- rstanarm::stan_glm(y ~ x, data = example_data, refresh = FALSE) # Prediction Summary prediction_summary(example_model, example_data, prob_inner = 0.6, prob_outer = 0.80, stable = TRUE)
Similarly, prediction_summary_cv()
returns the 4 cross-validated measures of a model's posterior prediction quality for each fold as well as a pooled result.
The k
argument represents the number of folds to use for cross-validation.
prediction_summary_cv(model = example_model, data = example_data, k = 2, prob_inner = 0.6, prob_outer = 0.80)
Given a set of observed data including a binary response variable y and an rstanreg model of y, the classification_summary()
function returns summaries of the model's posterior classification quality. These summaries include a confusion matrix as well as estimates of the model's sensitivity, specificity, and overall accuracy.
The cutoff
argument represents the probability cutoff to classify a new case as positive.
# Data generation x <- rnorm(20) z <- 3*x prob <- 1/(1+exp(-z)) y <- rbinom(20, 1, prob) example_data <- data.frame(x = x, y = y) # rstanreg model example_model <- rstanarm::stan_glm(y ~ x, data = example_data, family = binomial, refresh = FALSE) # Prediction Summary classification_summary(model = example_model, data = example_data, cutoff = 0.5)
The classification_summary_cv()
returns the same measures but for cross-validated estimates.
The k
argument represents the number of folds to use for cross-validation.
classification_summary_cv(model = example_model, data = example_data, k = 2, cutoff = 0.5)
Given a set of observed data including a categorical response variable y and a naiveBayes model of y, the naive_classification_summary()
function returns summaries of the model's posterior classification quality. These summaries include a confusion matrix as well as an estimate of the model's overall accuracy.
# Data data(penguins_bayes, package = "bayesrules") # naiveBayes model example_model <- e1071::naiveBayes(species ~ bill_length_mm, data = penguins_bayes) # Naive Classification Summary naive_classification_summary(model = example_model, data = penguins_bayes, y = "species")
Similarly naive_classification_summary_cv()
returns the cross validated confusion matrix.
The k
argument represents the number of folds to use for cross-validation.
naive_classification_summary_cv(model = example_model, data = penguins_bayes, y = "species", k = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.