rf1: Run the rf1 model

Description Usage Arguments Details Value

View source: R/rf1_main.R

Description

The RF1 model first fits a random forest, then excludes all variables with importance scores below the mean importance (NOTE: This will be problematic if all input variables are relevant) It then further pares down the variable list using variance partitioning. It retains only variables that contribute uniquely to explaining variation in the unmeasured year (via leave-one-year-out cross-validation). The RF1 model uses the randomForest package by Liaw & Wiener 2002, that implements the Random Forest method developed by Breiman 2001. MLE calcualtions from MLE_IR.R were written by Williams and Moffit 2005 (reformatted by A. Keyel). Note that an arbitrary starting seed is set, to ensure that results are repeatable.

Usage

1
2
3
4
5
rf1(forecast.targets, human.data, mosq.data, weather.data, weekinquestion,
  rf1.inputs, results.path, id.string, break.type = "seasonal",
  response.type = "continuous", quantile.model = 1, n.draws = 1000,
  bins = c(0, seq(1, 51, 1), 101, 151, 201, 1000),
  use.testing.objects = FALSE)

Arguments

forecast.targets

A vector containing options of what to forecast. 'annual.human.cases' generates human case predictions, while 'seasonal.mosquito.MLE' provides options for mosquito predictions

human.data

Data on human cases of the disease. Must be formatted with two columns: location and date. The location column contains the spatial unit (typically county), while the date corresponds to the date of the onset of symptoms for the human case.

mosq.data

Data on mosquito pools tested for the disease. Must be formatted with 4 columns: location (the spatial unit, e.g. county), col_date: the date the mosquitoes were tested, wnv_result: whether or not the pool was positive, pool_size: the number of mosquitoes tested in the pool. A fifth column species is optional but is not used by the code

weather.data

Data on weather variables to be included in the analysis. See the read.weather.data function for details about data format. The read.weather.data function from ArboMAP is a useful way to process one or more data files downloaded via Google Earth Engine.

weekinquestion

The focal week for the forecast. For the Random Forest model, this will be the last day used for making the forecast

rf1.inputs

Inputs specific to the RF1 model, see rf1.inputs. If this model is not included, this should be set to 'none' or omitted from the function call #**# LINK TO AN OBJECT WITH MORE DETAILS

results.path

The base path in which to place the modeling results. Some models will create sub-folders for model specific results

id.string

An id to use for labeling the aggregations across all locations

break.type

The temporal frequency to use for the data. The default is 'seasonal' which breaks the environmental data into January, February, March; April, May, June; July, August, September; October, November, December. Other options may be supported in the future.

response.type

Whether data should be treated as continuous (mosquito rates, number of cases) or binary (0 or 1).

quantile.model

Whether (1) or not (0) to use a quantile random forest for the final model output. All other calculations and model fitting use the standard randomForest package.

n.draws

The number of random realizations to draw for the RF1.distributions object

bins

Bin break points for the CDC forecast challenge

use.testing.objects

An indicator. If TRUE, the analysis will not run, but will load previously saved outputs in order to expedite testing the formatting of the code outputs.

Details

Citations Breiman, L. 2001. Random forests. Machine Learning 45: 5- 32 Keyel, A.C. et al. 2019 PLOS ONE 14(6): e0217854. https://doi.org/10.1371/journal.pone.0217854 Liaw & Wiener 2002. Classification and Regression by randomForest. R News 2: 18-22 Williams, C and C. Moffitt 2005. Estimation of pathogen prevalence in pooled samples using maximum likelihood methods and open source software. Journal of Aquatic Animal Health 17: 386-391

Value

Four outputs are generated: The Results dataframe, the Distributions dataframe, the Bins dataframe, and the model object results


akeyel/rf1 documentation built on Dec. 28, 2020, 4:48 a.m.