get.DICE.data: Retrieve all available data from the DICE database.

Description Usage Arguments Value Examples

View source: R/data_fxns.R

Description

get.DICE.data retrieves all the information for the model and fit regions from the DICE data base. DICE currently has Google Flu Trends (GFT) and Centers for Disease Control (CDC) for the United States and Dengue data for a large number of countries. It is assumed that one might be using finer resolution data (fit_level) to create a forecast for a larger area (mod_level).

Usage

1
2
3
4
5
get.DICE.data(data_source = "cdc", mod_level = 2, mod_name = c(NAME_2 =
  "US"), fit_names = "all", fit_level = 3, RegState = NULL, year = 2015,
  nperiodsFit = 52, model = 4, isingle = 0, db_opts = list(DICE_db =
  "predsci", CDC_server = TRUE), disease = "flu", epi_model = 1,
  method = "mech", all_years_flag = T, all_cad_clim = T)

Arguments

data_source

Describes the data source for the incidence data. Default is 'cdc' (for disease = 'flu'). It can be selected by source_key (integer) or source abbreviation (string). Most disease/location combinations have only one data source. In this case, it may be easier to set data_source=NULL. However, when multiple data sources exist, setting data_source=NULL will essentially choose from the available sources at random. To determine a data source by graphical interface, see: predsci.com/id_data/. Looking-up the disease and location will result in a list of data sources that can be entered into DICE. Alternatively, all country/disease/data_source combinations are listed in the ‘Data Sources Table’ tab at the same url. To access the list of sources directly from an R-prompt, see the examples below.

mod_level

An integer describing the spatial level of the model data.(Default value is 2) Levels: 0-Global, 1-Continent, 2-Country, 3-Region, 4-State, 5-County, 6-City. dice currently has data at levels 2-4 for CDC and GFT.

mod_name

Named vector of strings specifying the model-level spatial patch. If is.null(mod_name), the code reverts to using RegState (see next entry). To specify New York state, set mod_name=c(NAME_2="United States", NAME_3="R1", NAME_4="New York"). Here NAME_X is either the full name or abbreviation of the level-X patch. Replacing 'United States' with 'US' or 'R1' with 'Region 1' would result in the same outcome. Also, vector entries for mod_name should go from NAME_2,....,NAME_n where mod_level=n.

fit_level

An integer describing the spatial level of the fits used to construct the model-level profile/forecast (Default value is 3, must be >= mod_level).

RegState

Single element: determines which single region from mod_level is to be modeled. Depending on the model level, RegState should adhere to the following format: mod_level = 2 - 3-letter ISO3 country code, mod_level=3 - an integer describing the HHS region, mod_level=4 - a 2-letter state code.

year

A Number - The starting year of the flu season (Default value is 2017). dice currently has data for years 2003-2015 for CDC and 2003-2014 for GFT.

nperiodsFit

A number - the number of data periods the user wants to include in the fit. (Default is to include all available data)

model

A number - the model number (currently we support models 1-5 for flu and 4-5 for dengue. Default is model 4 )

isingle

Integer 0 couple the fit-level regions/patched 1 do NOT couple. Default is couple

db_opts

A list of database options. $DICE_db Determines which SQL database the data is retrieved from. 'PredSci' is the default SQL database, 'BSVE' is in development. Additional flags are for outside sources of data (currently only the CDC Influenza-Like_Illness (ILI) is supported: $CDC_server=TRUE).

disease

String - disease name. Options for modeling are: flu, dengue, yellow$\_$fever, ebola, zika, cholera, chik, plague. To graphically explore the data see: predsci.com/id$\_$data/. A full list of diseases in the DICE database can be found from an R-prompt by following one of the examples below.

epi_model

Numeric, 1 == sir (default) 2 == seir. Used to build a filename for output

method

String either 'mech' for compartmental mechanistic models or 'stat' for SARIMA models. Used to build a filename for output

all_years_flag

TRUE/FALSE, grab all years of incidence data in addition to the specified season.

all_cad_clim

TRUE/FALSE, grab all years of climate data that are available.

fit_name

A character vector indicating which fit-regions to use. If fit_name='all', then DICE uses all child-regions of the model region with level equal to fit_level. The other mode for fit_name is to specifiy a subset of the fit regions to construct an aggregate representation of the model region. For example if mod_level=c(NAME_2="US"), mod_level=2, fit_level=3, and fit_names=c("R1", "R2", "R3"), DICE will create an Atlantic super-region to model (as opposed to using all 10 HHS regions). Similarly, if mod_level=c(NAME_2="US"), mod_level=2, fit_level=4, and fit_names=c("WA", "OR", "CA"), DICE will create and model a super-state of Pacific states.

Value

mydata - a list with all available data and auxillary information for both the model and fit data sets.

For both we provide the percent weighted ILI, the number of cases, the weekly averaged specific humidity, precipitation and temerature and the school vacation schedule. For dengue - most of the data is monthly and almost all the data is number of cases. We also provide averaged specific humidity, precipitation and temperature on the same cadence as the dengue data.

The auxillary information, for both data sets, includes the populations, the lon/lat values and all the names describing the region.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
require(DICE)
# Get national and regional CDC mydata
get.DICE.data(data_source = 'cdc', mod_level = 2, fit_level = 3, RegState = 'usa', year = 2015, nperiodsFit = 45, mode = 5)

# Get Region9 and state GFT mydata
get.DICE.data(data_source = 'gft', mod_level = 3, fit_level = 4, RegState = 9    , year = 2013, nperiodsFit = 45, mode = 5)

# Create a 'west coast' region from California, Oregon, Washington
get.DICE.data(data_source = 'gft', mod_level = 3, fit_level = 4, RegState = c('CA','OR','WA'), year = 2013, nperiodsFit = 45, mode = 5)

Dengue data
get.DICE.data(mod_level = 3, fit_level = 3, year = 1010, nperiodsFit = 12, model = 4, isingle = 0,
             sql_db = TRUE, disease = 'dengue', RegState = 'BR')

-- Data diseases and data_sources -------
Access the database and list all available diseases:
library(DICE)
myDB = OpenCon()
data_sources = dbReadTable(con=myDB, name="data_sources")
unique(data_sources$disease)
# then list all data sources
str(data_sources)
data_sources$source_abbv
dbDisconnect(myDB)

predsci/DICE documentation built on Aug. 9, 2019, 9:41 a.m.