dfmip.hindcasts: DFMIP Hindcasts

View source: R/dfmip.R

dfmip.hindcastsR Documentation

DFMIP Hindcasts

Description

Generate hindcasts for a suite of models using common data inputs, and evaluate their accuracy using a standardized suite of validation metrics

Usage

dfmip.hindcasts(
  forecast.targets,
  models.to.run,
  focal.years,
  human.data,
  mosq.data,
  weather.data,
  results.path,
  model.inputs = list(),
  population.df = "none",
  threshold = "default",
  percentage = "default",
  id.string = "",
  season_start_month = 7,
  weeks_in_season = 2,
  sample_frequency = 1,
  n.draws = 1000,
  point.estimate = 0,
  analysis.locations = "default",
  is.test = FALSE
)

Arguments

forecast.targets

The quantities for which hindcasts are to be made. Options are:

annual.human.cases Number of human cases
seasonal.mosquito.MLE Mosquito infection rate maximum likelihood estimate averaged over the entire season
models.to.run

A string vector of the models to run. Options are:

NULL.MODELS Forecasts based on statewide incidence
ArboMAP Development version ArboMAP forecasts, see details section.
ArboMAP.MOD Modified version of ArboMAP model.
RF1_C Random Forest model, climate inputs only (i.e. equivalent inputs to ArboMAP).
RF1_A Random Forest model, all available inputs.

Note these entries are case-sensitive and are run by keyword, so run in a fixed order (NULL.MODELS, ArboMAP, ArboMAP.MOD, RF1_C, RF1_A), regardless of the order specified in the models.to.run vector.

focal.years

The years for which hindcasts will be made. Hindcasts will use all prior years as training data.

human.data

Data on human cases of the disease. Must be formatted with two columns: location and date. The location column contains the spatial unit (typically county), while the date corresponds to the date of the onset of symptoms for the human case. The date column must be in format M/D/Y, with forward slashes as delimiters #**# WHAT IF THESE DATA ARE MISSING? I.E. just making a mosquito forecast with RF1?

mosq.data

Data on mosquito pools tested for the disease. Must be formatted with 4 columns: location (the spatial unit, e.g. county), col_date: the date the mosquitoes were tested, wnv_result: whether or not the pool was positive, pool_size: the number of mosquitoes tested in the pool. A fifth column species is optional but is not used by the code

weather.data

Data on weather variables to be included in the analysis. See the read.weather.data function for details about data format. The read.weather.data function from ArboMAP is a useful way to process one or more data files downloaded via Google Earth Engine.

results.path

The base path in which to place the modeling results. Some models will create sub-folders for model specific results

model.inputs

A keyed list of model-specific inputs. Keyed entry options are:

arbo.inputs Inputs specific to the ArboMAP model. #**# DOCUMENTATION NEEDED
rf1.inputs Inputs specific to the RF1 model, see rf1.inputs.
population.df

Census information for calculating incidence. Can be set to 'none' or omitted from the function call #**# NEEDS FORMAT INSTRUCTIONS

threshold

For continuous and discrete forecasts, a threshold of error to be used in classifying the forecast as "accurate". The default is +/- 1 human case, +/- 1 week, otherwise the default is 0.

percentage

For continuous and discrete forecasts, if the prediction is within the specified percentage of the observed value, the forecast is considered accurate. The default is +/- 25 percent of the observed.

id.string

An ID to include in the forecast ID for this hindcast run (e.g., state)

season_start_month

The first month of the mosquito season, as a number. E.g., July would be 7.

weeks_in_season

The number of weeks to sample

sample_frequency

How frequently sample (default, 1 = weekly) #**# Other options are not currently supported

n.draws

The number of draws for the forecast distributions. Should generally be 1 if a point estimate is used, otherwise should be a large enough number to adequately represent the variation in the underlying data

point.estimate

Whether a single point estimate should be returned for forecast distributions representing the mean value. Otherwise past years are sampled at random.

analysis.locations

locations to include in the analysis. This may include locations with no human cases that would otherwise be dropped from the modeling.

is.test

Default is 0 (runs all models). If set to 1, saved results will be used for the Random Forest model. For testing purposes only.

Details

Forecast targets not yet supported, but in development:

human_incidence Human cases divided by location population.
peak_mosquito_MLE Peak mosquito infection rate (maximum likelihood estimate) during the season (averaged over what time period?)
number_positive_pools The number of positive mosquito pools observed by location
human_cases_binary Whether or not human cases will occur in a location
positive_pools_binary Whether or not a location will have any positive mosquito pools
peak_timing The week (day?) of the peak mosquito infection rate

Value

Four objects #**# ADD DOCUMENTATION. Also consider a list to hold the 'other.outputs'


akeyel/dfmip documentation built on Sept. 3, 2022, 1:26 a.m.