dfmip.hindcasts | R Documentation |
Generate hindcasts for a suite of models using common data inputs, and evaluate their accuracy using a standardized suite of validation metrics
dfmip.hindcasts( forecast.targets, models.to.run, focal.years, human.data, mosq.data, weather.data, results.path, model.inputs = list(), population.df = "none", threshold = "default", percentage = "default", id.string = "", season_start_month = 7, weeks_in_season = 2, sample_frequency = 1, n.draws = 1000, point.estimate = 0, analysis.locations = "default", is.test = FALSE )
forecast.targets |
The quantities for which hindcasts are to be made. Options are:
| |||||||||||
models.to.run |
A string vector of the models to run. Options are:
Note these entries are case-sensitive and are run by keyword, so run in a fixed order (NULL.MODELS, ArboMAP, ArboMAP.MOD, RF1_C, RF1_A), regardless of the order specified in the models.to.run vector. | |||||||||||
focal.years |
The years for which hindcasts will be made. Hindcasts will use all prior years as training data. | |||||||||||
human.data |
Data on human cases of the disease. Must be formatted with two columns: location and date. The location column contains the spatial unit (typically county), while the date corresponds to the date of the onset of symptoms for the human case. The date column must be in format M/D/Y, with forward slashes as delimiters #**# WHAT IF THESE DATA ARE MISSING? I.E. just making a mosquito forecast with RF1? | |||||||||||
mosq.data |
Data on mosquito pools tested for the disease. Must be formatted with 4 columns: location (the spatial unit, e.g. county), col_date: the date the mosquitoes were tested, wnv_result: whether or not the pool was positive, pool_size: the number of mosquitoes tested in the pool. A fifth column species is optional but is not used by the code | |||||||||||
weather.data |
Data on weather variables to be included in the analysis. See the read.weather.data function for details about data format. The read.weather.data function from ArboMAP is a useful way to process one or more data files downloaded via Google Earth Engine. | |||||||||||
results.path |
The base path in which to place the modeling results. Some models will create sub-folders for model specific results | |||||||||||
model.inputs |
A keyed list of model-specific inputs. Keyed entry options are:
| |||||||||||
population.df |
Census information for calculating incidence. Can be set to 'none' or omitted from the function call #**# NEEDS FORMAT INSTRUCTIONS | |||||||||||
threshold |
For continuous and discrete forecasts, a threshold of error to be used in classifying the forecast as "accurate". The default is +/- 1 human case, +/- 1 week, otherwise the default is 0. | |||||||||||
percentage |
For continuous and discrete forecasts, if the prediction is within the specified percentage of the observed value, the forecast is considered accurate. The default is +/- 25 percent of the observed. | |||||||||||
id.string |
An ID to include in the forecast ID for this hindcast run (e.g., state) | |||||||||||
season_start_month |
The first month of the mosquito season, as a number. E.g., July would be 7. | |||||||||||
weeks_in_season |
The number of weeks to sample | |||||||||||
sample_frequency |
How frequently sample (default, 1 = weekly) #**# Other options are not currently supported | |||||||||||
n.draws |
The number of draws for the forecast distributions. Should generally be 1 if a point estimate is used, otherwise should be a large enough number to adequately represent the variation in the underlying data | |||||||||||
point.estimate |
Whether a single point estimate should be returned for forecast distributions representing the mean value. Otherwise past years are sampled at random. | |||||||||||
analysis.locations |
locations to include in the analysis. This may include locations with no human cases that would otherwise be dropped from the modeling. | |||||||||||
is.test |
Default is 0 (runs all models). If set to 1, saved results will be used for the Random Forest model. For testing purposes only. |
Forecast targets not yet supported, but in development:
human_incidence | Human cases divided by location population. |
peak_mosquito_MLE | Peak mosquito infection rate (maximum likelihood estimate) during the season (averaged over what time period?) |
number_positive_pools | The number of positive mosquito pools observed by location |
human_cases_binary | Whether or not human cases will occur in a location |
positive_pools_binary | Whether or not a location will have any positive mosquito pools |
peak_timing | The week (day?) of the peak mosquito infection rate |
Four objects #**# ADD DOCUMENTATION. Also consider a list to hold the 'other.outputs'
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.