load_truth: Load truth data under multiple target variables from multiple...

View source: R/load_truth.R

load_truthR Documentation

Load truth data under multiple target variables from multiple truth sources

Description

By default, for the US hub, the resulting data.frame contains data for weekly incident cases (JHU), weekly incident deaths (JHU) and daily incident hospitalization (HealthData) at all county, state and national level. For the ECDC hub, the default resulting data.frame contains data for weekly incident cases (JHU), weekly incident deaths (JHU) and daily incident hospitalization (ECDC) for all European countries. For the FluSight hub, the default resulting data.frame contains data for weekly incident hospitalization (HealthData) for all US locations.

Usage

load_truth(
  truth_source = NULL,
  target_variable = NULL,
  as_of = NULL,
  truth_end_date = NULL,
  temporal_resolution = NULL,
  locations = NULL,
  data_location = NULL,
  local_repo_path = NULL,
  hub = c("US", "ECDC", "FluSight")
)

Arguments

truth_source

character vector specifying where the truths will be loaded from: currently support "JHU", "NYTimes", ⁠"HealthData⁠", "ECDC" and "OWID" If NULL, default for US hub is c("JHU", "HealthData"). If NULL, default for ECDC hub is c("OWID"). If NULL, default for FluSight hub is c("HealthData").

target_variable

string specifying target type It should be one or more of "cum death", "inc case", "inc death", "inc hosp". If NULL, default for US hub is c("inc case", "inc death", "inc hosp"). If NULL, default for ECDC hub is c("inc hosp"). If NULL, default for FluSight hub is c("inc flu hosp").

as_of

character vector of "as of" dates to use for querying truths in format 'yyyy-mm-dd'. For each spatial unit and temporal reporting unit, the last available data with an issue date on or before the given as_of date are returned. This is only available for covidData now.

truth_end_date

date to include the last available truth point in 'yyyy-mm-dd' format. If NULL,default to system date.

temporal_resolution

character specifying temporal resolution to include: currently support "weekly" and "daily". If NULL, default to "weekly" for cases and deaths, "daily" for hospitalizations. Weekly temporal_resolution will not be applied to "inc hosp" and "inc flu hosp"when multiple target variables are specified. "ECDC" truth data is weekly by default. Daily level data is not available.

locations

a vector of strings of fips code or CBSA codes or location names, such as "Hampshire County, MA", "Alabama", "United Kingdom". A US county location names must include state abbreviation. Default to NULL which would include all locations with available forecasts.

data_location

character specifying the location of truth data. Currently only supports "local_hub_repo", "remote_hub_repo" and "covidData". If NULL, default to "remote_hub_repo".

local_repo_path

path to local clone of the hub repository. Only used when data_location is "local_hub_repo"

hub

character, which hub to use. Default is "US". Other options are "ECDC" and "FluSight".

Details

  • "inc hosp" is only available from "HealthData", "ECDC" and "OWID"."inc flu hosp" is only available from "HealthData".

  • This function is not loading data for other target variables from "HealthData".

  • When loading data for multiple target variables for the US hub, temporal_resolution will be applied to all target variables but "inc hosp" and "inc flu hosp". In that case, the function will return daily incident COVID hospitalization counts and weekly incident Influenza hospitalization.

  • For the US hub, weekly temporal resolution will be applied to "inc hosp" if the user specifies "inc hosp" as the only target_variable. On the other hand, temporal_resolution will be applied to "inc hosp" in all cases for the ECDC hub.

  • When aggregating daily data, if there are not enough observations for a week, the corresponding weekly count would be NA in resulting data frame.

  • as_of is only supported when data_location = "covidData". Otherwise, this function will return a warning.

Value

data.frame with columns model, target_variable, target_end_date, location, value, location_name, population and extra information in these cases

  • If hub = "US", it returns extra columns geo_type, geo_value, abbreviation and full_location_name.

  • If truth_source = "ECDC", this function returns extra columns week_start. However, when target_variable is only ⁠inc hosp⁠, there are no extra columns appended to the resulting data frame.

Examples

library(covidHubUtils)

# load for US
load_truth(
  truth_source = c("JHU", "HealthData"),
  target_variable = c("inc case", "inc death", "inc hosp")
)

# load for ECDC
load_truth(
  truth_source = c("JHU"),
  target_variable = c("inc case", "inc death"),
  hub = "ECDC"
)

reichlab/covidHubUtils documentation built on Feb. 6, 2024, 1:42 p.m.