diceData: The MASTER DICE DATA SET

Description Usage Format Details Examples

Description

diceData is a dataset structured to organize global Influenza Like Illness (ILI) data on a multitude of spatial scales.

Usage

1

Format

An object of class list of length 13.

Details

The following is a brief description of each list entry:

$attr

A dataframe that associates each population with a continent, country, region, etc. Also includes latitude/longitude coordinates for the population-density-weighted centroid for that region. For example the United States has the following na

$ili

A dataframe that contains Google Flu Trends (GFT) %ILI data for flu seasons starting in 2003-2014. Data is weekly and covers the United States at the national, regional, and state levels. Each column corresponds to a specific nation/region/state. Each row corresponds to a specific week.

$sh

A dataframe that contains Specific Humidity (SH) data for each region and week. Hourly/Daily SH data on a spatial grid has been averaged spatially and temporally to give a single value for each region-week. Each column corresponds to a specific nation/region/state. Each row corresponds to a specific week. SH is in units kg/kg.

$school

A dataframe with approximated school schedules for each region and week. Values range from 0 to 1. 0 indicates all schools are in-session for the entire week. 1 indicates that all schools are out-of-session for the entire week. Each column corresponds to a specific nation/region/state. Each row corresponds to a specific week.

$pop

A dataframe with census-reported populations by year. Each column corresponds to a specific nation/region/state. Each row corresponds to a specific year. Values are number of residents.

$CDCili

A dataframe containing Centers for Disease Control (CDC) %ILI data. Currently contains data for flu seasons starting 2003-2015. Each column corresponds to a specific nation/region/state. Each row corresponds to a specific week. Note: when attempting to access the current flu season, get.DICE.data()/get.subset() will first attempt to get CDC data directly from the CDC server.

$CDCbaseline

A dataframe containing CDC onset-values for each region and year. CDC onset values provide a quantitative method for determining when a flu season begins and ends. Weeks with %ILI > CDCbaseline are considered ‘flu weeks’ that are part of the flu season.

An important aspect of the datastructure organization is the idea of spatial ‘levels’. The levels are defined as: 0-Global, 1-Continent, 2-Country, 3-Region, 4-State, 5-County, 6-City in the United States. Levels smaller than ‘country’ may be defined differently in each country.

In diceData$attr, the NAME_X fields indicate the association at level X. For example the United States has the following values: level=2, NAME_1='North.America', NAME_2='United.States', NAME_3=”, NAME_4=”, NAME_5=”, NAME_6=”. An empty character ” indicates a NAME that is not applicable. Another example is San Diego: level=6, NAME_1='North.America', NAME_2='United.States', NAME_3='Region9', NAME_4='California', NAME_5='San.Diego.County', NAME_6='San.Diego'

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Load dice
require(DICE)
# Load the dataset
data(DICE_dataset)

# to view the attributes for CDC Region 1
diceData$attr[diceData$attr$level==3 & diceData$attr$NAME_3=='Region1',]

# to view the first 20 weeks of GFT ILI data for all states in CDC Region 1
NamesInd = diceData$attr$NAME_4[diceData$attr$level==4 & diceData$attr$NAME_3=='Region1']
NamesInd
diceData$ili[1:20,NamesInd]

predsci/DICE documentation built on Aug. 9, 2019, 9:41 a.m.