Description Usage Arguments Value Note Author(s) References Examples
rfPoisson
implements Breiman's random forest algorithm (based on
Breiman and Cutler's original Fortran code) for regression and has been modified
to be used with Poisson data that have different observation periods.
More specifically, the best split is the one that will maximise the decrease of the poisson deviance. An offset
has also been introduced to accomodate for different times of exposure. The offset the log of the exposure.
1 2 3 4 5 6 | rfPoisson(x, offset = NULL, y = NULL, xtest = NULL, ytest = NULL,
offsettest = NULL, ntree = 500, mtry = max(floor(ncol(x)/3), 1),
replace = TRUE, sampsize = if (replace) nrow(x) else ceiling(0.632 *
nrow(x)), nodesize = 5000, maxnodes = NULL, importance = TRUE,
nPerm = 1, do.trace = FALSE, keep.forest = !is.null(y) &&
is.null(xtest), keep.inbag = FALSE, ...)
|
x |
a data frame or a matrix of predictors. |
offset |
a vector of same size as y, corresponding to the log of observation time (e.g. log of exposure). Default is 0. |
y |
a vector of Poisson responses |
xtest |
a data frame or matrix (like |
ytest |
response for the test set |
offsettest |
Offset for the test set (like |
ntree, |
Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. |
mtry |
Number of variables randomly sampled as candidates at each split. Default is p/3, where p is the number of variables in |
replace |
Should sampling of cases be done with or without replacement? |
sampsize |
Size(s) of sample to draw. |
nodesize |
Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). Default is 5000. |
maxnodes |
Maximum number of terminal nodes trees in the forest can have.
If not given, trees are grown to the maximum possible
(subject to limits by |
importance |
Should importance of predictors be assessed? Default is |
nPerm |
Not yet implemented. Number of times the OOB data are permuted per tree for assessing variable importance. Number larger than 1 gives slightly more stable estimate, but not very effective. |
do.trace |
If set to |
keep.forest |
If set to |
keep.inbag |
Should an |
... |
other parameters passed to lower functions. |
TBC
TBC
Florian Pechon, florian.pechon@uclouvain.be, based on the package randomForest by Andy Liaw andy_liaw@merck.com and Matthew Wiener matthew_wiener@merck.com based on original Fortran code by Leo Breiman and Adele Cutler.
R package randomForest, https://cran.r-project.org/package=randomForest
Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.
Breiman, L (2002), “Manual On Setting Up, Using, And Understanding Random Forests V3.1”, https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf.
1 2 3 4 5 6 7 8 9 10 | if (!require(CASdatasets)) install.packages("CASdatasets", repos = "http://cas.uqam.ca/pub/R/", type="source")
require(CASdatasets)
data("freMTPLfreq")
library(rfCountData)
m0 = rfPoisson(y = freMTPLfreq[1:10000,]$ClaimNb,
offset = log(freMTPLfreq[1:10000,]$Exposure),
x = freMTPLfreq[1:10000,c("Region", "Power", "DriverAge")],
ntree = 20)
predict(m0, newdata = freMTPLfreq[10001:10050,c("Region", "Power", "DriverAge")],
offset = log(freMTPLfreq[10001:10050,"Exposure"]))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.