dfmip.hindcasts: DFMIP Hindcasts
In akeyel/dfmip: Disease Forecast Model Intercomparison Project

dfmip.hindcasts

R Documentation

DFMIP Hindcasts

Description

Generate hindcasts for a suite of models using common data inputs, and evaluate their accuracy using a standardized suite of validation metrics

Usage

dfmip.hindcasts(
  forecast.targets,
  models.to.run,
  focal.years,
  human.data,
  mosq.data,
  weather.data,
  results.path,
  model.inputs = list(),
  population.df = "none",
  threshold = "default",
  percentage = "default",
  id.string = "",
  season_start_month = 7,
  weeks_in_season = 2,
  sample_frequency = 1,
  n.draws = 1000,
  point.estimate = 0,
  analysis.locations = "default",
  is.test = FALSE
)

Arguments

forecast.targets

The quantities for which hindcasts are to be made. Options are:

annual.human.cases	Number of human cases
seasonal.mosquito.MLE	Mosquito infection rate maximum likelihood estimate averaged over the entire season

models.to.run

A string vector of the models to run. Options are:

NULL.MODELS	Forecasts based on statewide incidence
ArboMAP	Development version ArboMAP forecasts, see details section.
ArboMAP.MOD	Modified version of ArboMAP model.
RF1_C	Random Forest model, climate inputs only (i.e. equivalent inputs to ArboMAP).
RF1_A	Random Forest model, all available inputs.

Note these entries are case-sensitive and are run by keyword, so run in a fixed order (NULL.MODELS, ArboMAP, ArboMAP.MOD, RF1_C, RF1_A), regardless of the order specified in the models.to.run vector.

focal.years

The years for which hindcasts will be made. Hindcasts will use all prior years as training data.

human.data

Data on human cases of the disease. Must be formatted with two columns: location and date. The location column contains the spatial unit (typically county), while the date corresponds to the date of the onset of symptoms for the human case. The date column must be in format M/D/Y, with forward slashes as delimiters #**# WHAT IF THESE DATA ARE MISSING? I.E. just making a mosquito forecast with RF1?

mosq.data

Data on mosquito pools tested for the disease. Must be formatted with 4 columns: location (the spatial unit, e.g. county), col_date: the date the mosquitoes were tested, wnv_result: whether or not the pool was positive, pool_size: the number of mosquitoes tested in the pool. A fifth column species is optional but is not used by the code

weather.data

Data on weather variables to be included in the analysis. See the read.weather.data function for details about data format. The read.weather.data function from ArboMAP is a useful way to process one or more data files downloaded via Google Earth Engine.

results.path

The base path in which to place the modeling results. Some models will create sub-folders for model specific results

model.inputs

A keyed list of model-specific inputs. Keyed entry options are:

arbo.inputs	Inputs specific to the ArboMAP model. #**# DOCUMENTATION NEEDED
rf1.inputs	Inputs specific to the RF1 model, see `rf1.inputs`.

population.df

Census information for calculating incidence. Can be set to 'none' or omitted from the function call #**# NEEDS FORMAT INSTRUCTIONS

threshold

For continuous and discrete forecasts, a threshold of error to be used in classifying the forecast as "accurate". The default is +/- 1 human case, +/- 1 week, otherwise the default is 0.

percentage

For continuous and discrete forecasts, if the prediction is within the specified percentage of the observed value, the forecast is considered accurate. The default is +/- 25 percent of the observed.

id.string

An ID to include in the forecast ID for this hindcast run (e.g., state)

season_start_month

The first month of the mosquito season, as a number. E.g., July would be 7.

weeks_in_season

The number of weeks to sample

sample_frequency

How frequently sample (default, 1 = weekly) #**# Other options are not currently supported

n.draws

The number of draws for the forecast distributions. Should generally be 1 if a point estimate is used, otherwise should be a large enough number to adequately represent the variation in the underlying data

point.estimate

Whether a single point estimate should be returned for forecast distributions representing the mean value. Otherwise past years are sampled at random.

analysis.locations

locations to include in the analysis. This may include locations with no human cases that would otherwise be dropped from the modeling.

is.test

Default is 0 (runs all models). If set to 1, saved results will be used for the Random Forest model. For testing purposes only.

Details

Forecast targets not yet supported, but in development:

human_incidence	Human cases divided by location population.
peak_mosquito_MLE	Peak mosquito infection rate (maximum likelihood estimate) during the season (averaged over what time period?)
number_positive_pools	The number of positive mosquito pools observed by location
human_cases_binary	Whether or not human cases will occur in a location
positive_pools_binary	Whether or not a location will have any positive mosquito pools
peak_timing	The week (day?) of the peak mosquito infection rate