rfPoisson: rfPoisson
In fpechon/rfCountData: Random Forests for Count Data

Description Usage Arguments Value Note Author(s) References Examples

rfPoisson implements Breiman's random forest algorithm (based on Breiman and Cutler's original Fortran code) for regression and has been modified to be used with Poisson data that have different observation periods.
More specifically, the best split is the one that will maximise the decrease of the poisson deviance. An offset has also been introduced to accomodate for different times of exposure. The offset the log of the exposure.

rfPoisson(x, offset = NULL, y = NULL, xtest = NULL, ytest = NULL,
  offsettest = NULL, ntree = 500, mtry = max(floor(ncol(x)/3), 1),
  replace = TRUE, sampsize = if (replace) nrow(x) else ceiling(0.632 *
  nrow(x)), nodesize = 5000, maxnodes = NULL, importance = TRUE,
  nPerm = 1, do.trace = FALSE, keep.forest = !is.null(y) &&
  is.null(xtest), keep.inbag = FALSE, ...)

`x`	a data frame or a matrix of predictors.
`offset`	a vector of same size as y, corresponding to the log of observation time (e.g. log of exposure). Default is 0.
`y`	a vector of Poisson responses
`xtest`	a data frame or matrix (like `x`) containing predictors for the test set.
`ytest`	response for the test set
`offsettest`	Offset for the test set (like `offset`)
`ntree,`	Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times.
`mtry`	Number of variables randomly sampled as candidates at each split. Default is p/3, where p is the number of variables in `x`.
`replace`	Should sampling of cases be done with or without replacement?
`sampsize`	Size(s) of sample to draw.
`nodesize`	Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). Default is 5000.
`maxnodes`	Maximum number of terminal nodes trees in the forest can have. If not given, trees are grown to the maximum possible (subject to limits by `nodesize`). If set larger than maximum possible, a warning is issued.
`importance`	Should importance of predictors be assessed? Default is `TRUE`.
`nPerm`	Not yet implemented. Number of times the OOB data are permuted per tree for assessing variable importance. Number larger than 1 gives slightly more stable estimate, but not very effective.
`do.trace`	If set to `TRUE`, give a more verbose output as `rfPoisson` is run. If set to some integer, then running output is printed for every `do.trace` trees.
`keep.forest`	If set to `FALSE`, the forest will not be retained in the output object. If `xtest` is given, defaults to `FALSE`.
`keep.inbag`	Should an `n` by `ntree` matrix be returned that keeps track of which samples are ‘in-bag’ in which trees (but not how many times, if sampling with replacement
`...`	other parameters passed to lower functions.

TBC

Florian Pechon, florian.pechon@uclouvain.be, based on the package randomForest by Andy Liaw andy_liaw@merck.com and Matthew Wiener matthew_wiener@merck.com based on original Fortran code by Leo Breiman and Adele Cutler.

R package randomForest, https://cran.r-project.org/package=randomForest
Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.
Breiman, L (2002), “Manual On Setting Up, Using, And Understanding Random Forests V3.1”, https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf.

if (!require(CASdatasets)) install.packages("CASdatasets", repos = "http://cas.uqam.ca/pub/R/", type="source")
require(CASdatasets)
data("freMTPLfreq")
library(rfCountData)
m0 = rfPoisson(y = freMTPLfreq[1:10000,]$ClaimNb,
                  offset = log(freMTPLfreq[1:10000,]$Exposure),
                  x = freMTPLfreq[1:10000,c("Region", "Power", "DriverAge")],
                  ntree = 20)
predict(m0, newdata = freMTPLfreq[10001:10050,c("Region", "Power", "DriverAge")], 
offset = log(freMTPLfreq[10001:10050,"Exposure"]))