rf1: Run the rf1 model
In akeyel/rf1: Random Forest Workflow 1 for DFMIP

Description Usage Arguments Details Value

The RF1 model first fits a random forest, then excludes all variables with importance scores below the mean importance (NOTE: This will be problematic if all input variables are relevant) It then further pares down the variable list using variance partitioning. It retains only variables that contribute uniquely to explaining variation in the unmeasured year (via leave-one-year-out cross-validation). The RF1 model uses the randomForest package by Liaw & Wiener 2002, that implements the Random Forest method developed by Breiman 2001. MLE calcualtions from MLE_IR.R were written by Williams and Moffit 2005 (reformatted by A. Keyel). Note that an arbitrary starting seed is set, to ensure that results are repeatable.

rf1(forecast.targets, human.data, mosq.data, weather.data, weekinquestion,
  rf1.inputs, results.path, id.string, break.type = "seasonal",
  response.type = "continuous", quantile.model = 1, n.draws = 1000,
  bins = c(0, seq(1, 51, 1), 101, 151, 201, 1000),
  use.testing.objects = FALSE)

`forecast.targets`	A vector containing options of what to forecast. 'annual.human.cases' generates human case predictions, while 'seasonal.mosquito.MLE' provides options for mosquito predictions
`human.data`	Data on human cases of the disease. Must be formatted with two columns: location and date. The location column contains the spatial unit (typically county), while the date corresponds to the date of the onset of symptoms for the human case.
`mosq.data`	Data on mosquito pools tested for the disease. Must be formatted with 4 columns: location (the spatial unit, e.g. county), col_date: the date the mosquitoes were tested, wnv_result: whether or not the pool was positive, pool_size: the number of mosquitoes tested in the pool. A fifth column species is optional but is not used by the code
`weather.data`	Data on weather variables to be included in the analysis. See the read.weather.data function for details about data format. The read.weather.data function from ArboMAP is a useful way to process one or more data files downloaded via Google Earth Engine.
`weekinquestion`	The focal week for the forecast. For the Random Forest model, this will be the last day used for making the forecast
`rf1.inputs`	Inputs specific to the RF1 model, see `rf1.inputs`. If this model is not included, this should be set to 'none' or omitted from the function call #**# LINK TO AN OBJECT WITH MORE DETAILS
`results.path`	The base path in which to place the modeling results. Some models will create sub-folders for model specific results
`id.string`	An id to use for labeling the aggregations across all locations
`break.type`	The temporal frequency to use for the data. The default is 'seasonal' which breaks the environmental data into January, February, March; April, May, June; July, August, September; October, November, December. Other options may be supported in the future.
`response.type`	Whether data should be treated as continuous (mosquito rates, number of cases) or binary (0 or 1).
`quantile.model`	Whether (1) or not (0) to use a quantile random forest for the final model output. All other calculations and model fitting use the standard randomForest package.
`n.draws`	The number of random realizations to draw for the RF1.distributions object
`bins`	Bin break points for the CDC forecast challenge
`use.testing.objects`	An indicator. If TRUE, the analysis will not run, but will load previously saved outputs in order to expedite testing the formatting of the code outputs.

Citations Breiman, L. 2001. Random forests. Machine Learning 45: 5- 32 Keyel, A.C. et al. 2019 PLOS ONE 14(6): e0217854. https://doi.org/10.1371/journal.pone.0217854 Liaw & Wiener 2002. Classification and Regression by randomForest. R News 2: 18-22 Williams, C and C. Moffitt 2005. Estimation of pathogen prevalence in pooled samples using maximum likelihood methods and open source software. Journal of Aquatic Animal Health 17: 386-391

Four outputs are generated: The Results dataframe, the Distributions dataframe, the Bins dataframe, and the model object results

akeyel/rf1 documentation built on Dec. 28, 2020, 4:48 a.m.