Description Usage Arguments Details Value References Examples
Forests based on maximum-likelihood estimation of parameters for specified distribution families, for example from the GAMLSS family (for generalized additive models for location, scale, and shape).
1 2 3 4 5 6 7 8 9 10 11 12 13 | distforest(formula, data, subset, na.action = na.pass, weights,
offset, cluster, family = NO(), strata,
control = disttree_control(teststat = "quad", testtype = "Univ",
mincriterion = 0, saveinfo = FALSE, minsplit = 20, minbucket = 7,
splittry = 2, ...),
ntree = 500L, fit.par = FALSE,
perturb = list(replace = FALSE, fraction = 0.632),
mtry = ceiling(sqrt(nvar)), applyfun = NULL, cores = NULL,
trace = FALSE, ...)
## S3 method for class 'distforest'
predict(object, newdata = NULL,
type = c("parameter", "response", "weights", "node"),
OOB = FALSE, scale = TRUE, ...)
|
formula |
a symbolic description of the model to be fit. This
should be of type |
data |
a data frame containing the variables in the model. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain missing value. |
weights |
an optional vector of weights to be used in the fitting
process. Non-negative integer valued weights are
allowed as well as non-negative real weights.
Observations are sampled (with or without replacement)
according to probabilities |
offset |
an optional vector of offset values. |
cluster |
an optional factor indicating independent clusters. Highly experimental, use at your own risk. |
family |
specification of the response distribution.
Either a |
strata |
an optional factor for stratified sampling. |
control |
a list with control parameters, see
|
ntree |
number of trees to grow for the forest. |
fit.par |
logical. if TRUE, fitted and predicted values and predicted parameters are calculated for the learning data (together with loglikelihood) |
perturb |
a list with arguments |
mtry |
number of input variables randomly sampled as candidates
at each node for random forest like algorithms. Bagging, as special case
of a random forest without random input variable sampling, can
be performed by setting |
applyfun |
an optional |
cores |
numeric. If set to an integer the |
trace |
a logical indicating if a progress bar shall be printed while the forest grows. |
object |
an object as returned by |
newdata |
an optional data frame containing test data. |
type |
a character string denoting the type of predicted value
returned. For |
OOB |
a logical defining out-of-bag predictions (only if |
scale |
a logical indicating scaling of the nearest neighbor weights by the sum of weights in the corresponding terminal node of each tree. In the simple regression forest, predicting the conditional mean by nearest neighbor weights will be equivalent to (but slower!) the aggregation of means. |
... |
arguments to be used to form the default |
Distributional regression forests are an application of model-based recursive partitioning
(implemented in mob
, ctree
and cforest
) to parametric model fits based on the GAMLSS family of distributions.
Distributional regression trees, see disttree
, are fitted to each
of the ntree
perturbed samples of the learning sample. Most of the hyper parameters in
disttree_control
regulate the construction of the distributional regression trees.
Hyper parameters you might want to change are:
1. The number of randomly preselected variables mtry
, which is fixed
to the square root of the number of input variables.
2. The number of trees ntree
. Use more trees if you have more variables.
3. The depth of the trees, regulated by mincriterion
. Usually unstopped and unpruned
trees are used in random forests. To grow large trees, set mincriterion
to a small value.
The aggregation scheme works by averaging observation weights extracted
from each of the ntree
trees and NOT by averaging predictions directly
as in randomForest
.
See Schlosser et al. (2019), Hothorn et al. (2004), and Meinshausen (2006) for a description.
Predictions can be computed using predict
. For observations
with zero weights, predictions are computed from the fitted tree
when newdata = NULL
.
An object of class distforest
.
Breiman L (2001). Random Forests. Machine Learning, 45(1), 5–32.
Hothorn T, Lausen B, Benner A, Radespiel-Troeger M (2004). Bagging Survival Trees. Statistics in Medicine, 23(1), 77–91.
Hothorn T, B\"uhlmann P, Dudoit S, Molinaro A, Van der Laan MJ (2006a). Survival Ensembles. Biostatistics, 7(3), 355–373.
Hothorn T, Hornik K, Zeileis A (2006b). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3), 651–674.
Hothorn T, Zeileis A (2015). partykit: A Modular Toolkit for Recursive Partytioning in R. Journal of Machine Learning Research, 16, 3905–3909.
Meinshausen N (2006). Quantile Regression Forests. Journal of Machine Learning Research, 7, 983–999.
Schlosser L, Hothorn T, Stauffer R, Zeileis A (2019). Distributional Regression Forests for Probabilistic Precipitation Forecasting in Complex Terrain. arXiv 1804.02921, arXiv.org E-Print Archive. http://arxiv.org/abs/1804.02921v3
Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007). Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution. BMC Bioinformatics, 8, 25. http://www.biomedcentral.com/1471-2105/8/25
Strobl C, Malley J, Tutz G (2009). An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests. Psychological Methods, 14(4), 323–348.
1 2 3 4 5 6 7 8 | ## basic example: distributional regression forest for cars data
df <- distforest(dist ~ speed, data = cars)
## prediction of fitted mean and visualization
nd <- data.frame(speed = 4:25)
nd$mean <- predict(df, newdata = nd, type = "response")[["(fitted.response)"]]
plot(dist ~ speed, data = cars)
lines(mean ~ speed, data = nd)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.