load_truth: Load truth data under multiple target variables from multiple...
In reichlab/covidHubUtils: Utility functions for the COVID-19 forecast hub

load_truth

R Documentation

Load truth data under multiple target variables from multiple truth sources

Description

By default, for the US hub, the resulting data.frame contains data for weekly incident cases (JHU), weekly incident deaths (JHU) and daily incident hospitalization (HealthData) at all county, state and national level. For the ECDC hub, the default resulting data.frame contains data for weekly incident cases (JHU), weekly incident deaths (JHU) and daily incident hospitalization (ECDC) for all European countries. For the FluSight hub, the default resulting data.frame contains data for weekly incident hospitalization (HealthData) for all US locations.

Usage

load_truth(
  truth_source = NULL,
  target_variable = NULL,
  as_of = NULL,
  truth_end_date = NULL,
  temporal_resolution = NULL,
  locations = NULL,
  data_location = NULL,
  local_repo_path = NULL,
  hub = c("US", "ECDC", "FluSight")
)

Arguments

`truth_source`	character vector specifying where the truths will be loaded from: currently support `"JHU"`, `"NYTimes"`, `⁠"HealthData⁠`", `"ECDC"` and `"OWID"` If `NULL`, default for US hub is `c("JHU", "HealthData")`. If `NULL`, default for ECDC hub is `c("OWID")`. If `NULL`, default for FluSight hub is `c("HealthData")`.
`target_variable`	string specifying target type It should be one or more of `"cum death"`, `"inc case"`, `"inc death"`, `"inc hosp"`. If `NULL`, default for US hub is `c("inc case", "inc death", "inc hosp")`. If `NULL`, default for ECDC hub is `c("inc hosp")`. If `NULL`, default for FluSight hub is `c("inc flu hosp")`.
`as_of`	character vector of "as of" dates to use for querying truths in format 'yyyy-mm-dd'. For each spatial unit and temporal reporting unit, the last available data with an issue date on or before the given `as_of` date are returned. This is only available for `covidData` now.
`truth_end_date`	date to include the last available truth point in 'yyyy-mm-dd' format. If `NULL`,default to system date.
`temporal_resolution`	character specifying temporal resolution to include: currently support `"weekly"` and `"daily"`. If `NULL`, default to `"weekly"` for cases and deaths, `"daily"` for hospitalizations. Weekly `temporal_resolution` will not be applied to `"inc hosp"` and `"inc flu hosp"`when multiple target variables are specified. `"ECDC"` truth data is weekly by default. Daily level data is not available.
`locations`	a vector of strings of fips code or CBSA codes or location names, such as "Hampshire County, MA", "Alabama", "United Kingdom". A US county location names must include state abbreviation. Default to `NULL` which would include all locations with available forecasts.
`data_location`	character specifying the location of truth data. Currently only supports `"local_hub_repo"`, `"remote_hub_repo"` and `"covidData"`. If `NULL`, default to `"remote_hub_repo"`.
`local_repo_path`	path to local clone of the hub repository. Only used when data_location is `"local_hub_repo"`
`hub`	character, which hub to use. Default is "US". Other options are "ECDC" and "FluSight".

Details

"inc hosp" is only available from "HealthData", "ECDC" and "OWID"."inc flu hosp" is only available from "HealthData".
This function is not loading data for other target variables from "HealthData".
When loading data for multiple target variables for the US hub, temporal_resolution will be applied to all target variables but "inc hosp" and "inc flu hosp". In that case, the function will return daily incident COVID hospitalization counts and weekly incident Influenza hospitalization.
For the US hub, weekly temporal resolution will be applied to "inc hosp" if the user specifies "inc hosp" as the only target_variable. On the other hand, temporal_resolution will be applied to "inc hosp" in all cases for the ECDC hub.
When aggregating daily data, if there are not enough observations for a week, the corresponding weekly count would be NA in resulting data frame.
as_of is only supported when data_location = "covidData". Otherwise, this function will return a warning.

Value

data.frame with columns model, target_variable, target_end_date, location, value, location_name, population and extra information in these cases

If hub = "US", it returns extra columns geo_type, geo_value, abbreviation and full_location_name.
If truth_source = "ECDC", this function returns extra columns week_start. However, when target_variable is only ⁠inc hosp⁠, there are no extra columns appended to the resulting data frame.

Examples

library(covidHubUtils)

# load for US
load_truth(
  truth_source = c("JHU", "HealthData"),
  target_variable = c("inc case", "inc death", "inc hosp")
)

# load for ECDC
load_truth(
  truth_source = c("JHU"),
  target_variable = c("inc case", "inc death"),
  hub = "ECDC"
)

reichlab/covidHubUtils documentation built on Feb. 6, 2024, 1:42 p.m.