knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The hypothesis this note will hopefully eventually check is whether cross-validation can reduce overfitting in EBMA. Right now it only demonstrates overfitting for the demo example.

suppressPackageStartupMessages({
  library("dplyr")
  library("tidyr")
  library("forcats")
  library("EBMAforecast")
  library("EBMAhelper")
})

Continuous outcome demo: presidential forecasts

The canonical EBMAforecast package includes two demo data sets that are used to illustrate the performance of EBMA. The first demo example consists of forecasts for vote share in US presidential elections.

These data consist of 6 different predictions for the incumbent party vote share in 15 US presidential elections, along with the actual vote share.

data(presidentialForecast)

dplyr::glimpse(presidentialForecast)

The original application [@montegomery:etal:2014] uses only 1 case as an out-of-sample test, the 2012 election. I would like to get a more robust estimate of out-of-sample performance, and so I am going to use cross-validation with 3 of the 15 cases left out for testing at each split. To avoid the random element of how splits are assigned, I am going to do with for all 455 distinct^[15 choose 3 = 155] calibration/test splits, and run them all.

To assess performance, I will look at calibration and test period MAE and RMSE for the component models, EBMA, and a baseline simple average ensemble. Relevant questions:

x_data <- presidentialForecast[, c(1:6)]
y_data <- presidentialForecast[, 7]

# 15 choose 3 for train/test split; get all unique combinations
all_splits <- combn(1:15, 3, simplify = FALSE)

output <- list(NULL)
for (i in seq_along(all_splits)) {
  #cat(paste0(i, "."))
  #if (i %% 10 == 0) cat("\n")

  # Split 12/3 for calibration/test
  test_idx  <- all_splits[[i]]
  train_idx <- setdiff(1:15, all_splits[[i]])

  # Calibrate EBMA
  ebma_fit <- ebma(y = y_data[train_idx], x = x_data[train_idx, ],
                   model_type = "normal", 
                   useModelParams = FALSE, tol = 0.0001, const = 0,
                   method = "EM")

  # Save calibration/test RMSE/MAE for each model
  df  <- summary(ebma_fit, period="calibration", showCoefs=FALSE)@summaryData
  out <- data.frame(Period = "calib", Model = rownames(df), df[, c("rmse", "mae")])
  df  <- summary(ebma_fit, period="test", showCoefs=FALSE)@summaryData
  out <- rbind(out, data.frame(Period = "test", Model = rownames(df), df[, c("rmse", "mae")]))

  # Just average
  simple_avg <- rowMeans(x_data)
  fit <- data.frame(Period = c("calib", "test"),
                    Model = "Simple avg", 
                    rmse = c(
                      sqrt(mean((simple_avg[train_idx] - y_data[train_idx])^2)),
                      sqrt(mean((simple_avg[test_idx]  - y_data[test_idx])^2))
                    ),
                    mae = c(
                      mean(abs(simple_avg[train_idx] - y_data[train_idx])),
                      mean(abs(simple_avg[test_idx]  - y_data[test_idx]))
                    ))

  out <- rbind(out, fit)
  out$index <- i

  output[i] <- list(out)
}
output <- do.call(rbind, output)

Here are the results, ordered from best test RMSE to worse test RMSE.

# Average RMSE/MAE fit by model
output %>%
  tidyr::gather(stat, value, rmse:mae) %>%
  dplyr::group_by(Model, Period, stat) %>% 
  dplyr::summarize(mean = mean(value)) %>% 
  dplyr::ungroup() %>%
  dplyr::mutate(
    Period = forcats::fct_recode(Period, Calib = "calib", Test = "test"),
    stat = toupper(stat)
  ) %>%
  tidyr::unite(stat, Period, stat) %>%
  tidyr::spread(stat, mean) %>%
  dplyr::arrange(Test_RMSE) %>%
  dplyr::select(Model, ends_with("MAE"), ends_with("RMSE")) %>%
  knitr::kable(digits = 2)

References



andybega/EBMAhelper documentation built on April 6, 2022, 11:11 a.m.