as_covid_hub_forecasts: Reformat model outputs stored as a 'model_output_tbl' class...

View source: R/as_covid_hub_forecasts.R

as_covid_hub_forecastsR Documentation

Reformat model outputs stored as a model_output_tbl class (or similar) to that of a data.frame formatted according to standards of the COVID-19 Forecasting Hub which can be processed by functions from the covidHubUtils package such as score_forecasts() or plot_forecasts(). The supplied model_output_tbl should have columns defining properties akin to reference dates, locations, horizons, and targets.

Description

Reformat model outputs stored as a model_output_tbl class (or similar) to that of a data.frame formatted according to standards of the COVID-19 Forecasting Hub which can be processed by functions from the covidHubUtils package such as score_forecasts() or plot_forecasts(). The supplied model_output_tbl should have columns defining properties akin to reference dates, locations, horizons, and targets.

Usage

as_covid_hub_forecasts(
  model_outputs,
  model_id_col = "model_id",
  reference_date_col = "forecast_date",
  location_col = "location",
  horizon_col = "horizon",
  target_col = "target",
  output_type_col = "output_type",
  output_type_id_col = "output_type_id",
  value_col = "value",
  temp_res_col = "temporal_resolution",
  target_end_date_col = "target_end_date"
)

Arguments

model_outputs

an object of class model_output_tbl with component model outputs (e.g., predictions). Should have columns containing the following information: model name, reference date or target end date, location, horizon, target, temporal resolution*, output type, output type id, and value. Note that the temporal resolution may be included in the target column.

model_id_col

character string of the name of the column containing the model name(s) for the forecasts. Defaults to "model_id". Should be set to NULL if no such column exists, in which case a model_id column will be created populated with the value "model_id".

reference_date_col

character string of the name of the column containing the reference dates for the forecasts. Defaults to "forecast_date". Should be set to NULL if no such column exists, in which case the column will be created using the following information: horizon, target end date, and temporal resolution.

location_col

character string of the name of the column containing the locations for the forecasts. Defaults to "location".

horizon_col

character string of the name of the column containing the horizons for the forecasts. Defaults to "horizon".

target_col

character string of the name of the column containing the targets for the forecasts. Defaults to "target". If temp_res_col is NULL, the target column in model_outputs is assumed to contain targets of the form "temporal_resolution target" or "temporal_resolution ahead target", such as "wk ahead inc flu hosp" "wk inc flu hosp".

output_type_col

character string of the name of the column containing the output types for the forecasts. Defaults to "output_type".

output_type_id_col

character string of the name of the column containing the output type ids for the forecasts. Defaults to "output_type_id".

value_col

character string of the name of the column containing the values for the forecasts. Defaults to "value".

temp_res_col

character string of the name of the column containing the temporal resolutions for the forecasts. Defaults to "temporal_resolution". Should be set to NULL if no such column exists, in which case the column will be created from the existing target column.

target_end_date_col

character string of the name of the column containing the target end dates for the forecasts. Defaults to "target_end_date". Should be set to NULL if no such column exists, in which case the column will be created using the following information: horizon, forecast date, and temporal resolution.

Value

a data.frame of reformatted model outputs that may be fed into any of the covidHubUtils functions with 10 total columns: model, forecast_date, location, horizon, temporal_resolution, target_variable, target_end_date, type, quantile, value. Other columns are removed.

Examples

library(dplyr)
forecasts <- load_forecasts(
  models = c("COVIDhub-ensemble", "UMass-MechBayes"),
  dates = "2020-12-14",
  date_window_size = 7,
  locations = c("US"),
  targets = paste(1:4, "wk ahead inc death"),
  source = "zoltar"
) 
altered_forecasts <- forecasts |> # Alter forecasts to not be CovidHub format
  dplyr::rename(model_id=model, output_type=type, output_type_id=quantile) |>
  dplyr::mutate(target_variable = "wk ahead inc death", horizon=as.numeric(horizon)) |>
  dplyr::select(-temporal_resolution)
formatted_forecasts <- as_covid_hub_forecasts(
   altered_forecasts, 
   target_col="target_variable", 
   temp_res_col=NULL 
) |>
dplyr::mutate(horizon=as.character(horizon))
testthat::expect_equal(formatted_forecasts, dplyr::select(forecasts, model:value)) 

reichlab/covidHubUtils documentation built on Feb. 6, 2024, 1:42 p.m.