knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The hypothesis this note will hopefully eventually check is whether cross-validation can reduce overfitting in EBMA. Right now it only demonstrates overfitting for the demo example.
suppressPackageStartupMessages({ library("dplyr") library("tidyr") library("forcats") library("EBMAforecast") library("EBMAhelper") })
The canonical EBMAforecast package includes two demo data sets that are used to illustrate the performance of EBMA. The first demo example consists of forecasts for vote share in US presidential elections.
These data consist of 6 different predictions for the incumbent party vote share in 15 US presidential elections, along with the actual vote share.
data(presidentialForecast) dplyr::glimpse(presidentialForecast)
The original application [@montegomery:etal:2014] uses only 1 case as an out-of-sample test, the 2012 election. I would like to get a more robust estimate of out-of-sample performance, and so I am going to use cross-validation with 3 of the 15 cases left out for testing at each split. To avoid the random element of how splits are assigned, I am going to do with for all 455 distinct^[15 choose 3 = 155] calibration/test splits, and run them all.
To assess performance, I will look at calibration and test period MAE and RMSE for the component models, EBMA, and a baseline simple average ensemble. Relevant questions:
x_data <- presidentialForecast[, c(1:6)] y_data <- presidentialForecast[, 7] # 15 choose 3 for train/test split; get all unique combinations all_splits <- combn(1:15, 3, simplify = FALSE) output <- list(NULL) for (i in seq_along(all_splits)) { #cat(paste0(i, ".")) #if (i %% 10 == 0) cat("\n") # Split 12/3 for calibration/test test_idx <- all_splits[[i]] train_idx <- setdiff(1:15, all_splits[[i]]) # Calibrate EBMA ebma_fit <- ebma(y = y_data[train_idx], x = x_data[train_idx, ], model_type = "normal", useModelParams = FALSE, tol = 0.0001, const = 0, method = "EM") # Save calibration/test RMSE/MAE for each model df <- summary(ebma_fit, period="calibration", showCoefs=FALSE)@summaryData out <- data.frame(Period = "calib", Model = rownames(df), df[, c("rmse", "mae")]) df <- summary(ebma_fit, period="test", showCoefs=FALSE)@summaryData out <- rbind(out, data.frame(Period = "test", Model = rownames(df), df[, c("rmse", "mae")])) # Just average simple_avg <- rowMeans(x_data) fit <- data.frame(Period = c("calib", "test"), Model = "Simple avg", rmse = c( sqrt(mean((simple_avg[train_idx] - y_data[train_idx])^2)), sqrt(mean((simple_avg[test_idx] - y_data[test_idx])^2)) ), mae = c( mean(abs(simple_avg[train_idx] - y_data[train_idx])), mean(abs(simple_avg[test_idx] - y_data[test_idx])) )) out <- rbind(out, fit) out$index <- i output[i] <- list(out) } output <- do.call(rbind, output)
Here are the results, ordered from best test RMSE to worse test RMSE.
# Average RMSE/MAE fit by model output %>% tidyr::gather(stat, value, rmse:mae) %>% dplyr::group_by(Model, Period, stat) %>% dplyr::summarize(mean = mean(value)) %>% dplyr::ungroup() %>% dplyr::mutate( Period = forcats::fct_recode(Period, Calib = "calib", Test = "test"), stat = toupper(stat) ) %>% tidyr::unite(stat, Period, stat) %>% tidyr::spread(stat, mean) %>% dplyr::arrange(Test_RMSE) %>% dplyr::select(Model, ends_with("MAE"), ends_with("RMSE")) %>% knitr::kable(digits = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.