orf | R Documentation |
An implementation of the Ordered Forest estimator as developed
in Lechner & Okasa (2019). The Ordered Forest flexibly
estimates the conditional probabilities of models with ordered
categorical outcomes (so-called ordered choice models).
Additionally to common machine learning algorithms the orf
package provides functions for estimating marginal effects as well
as statistical inference thereof and thus provides similar output
as in standard econometric models for ordered choice. The core
forest algorithm relies on the fast C++ forest implementation
from the ranger
package (Wright & Ziegler, 2017).
orf( X, Y, num.trees = 1000, mtry = NULL, min.node.size = NULL, replace = FALSE, sample.fraction = NULL, honesty = TRUE, honesty.fraction = NULL, inference = FALSE, importance = FALSE )
X |
numeric matrix of features |
Y |
numeric vector of outcomes |
num.trees |
scalar, number of trees in a forest, i.e. bootstrap replications (default is 1000 trees) |
mtry |
scalar, number of randomly selected features (default is the squared root of number of features, rounded up to the nearest integer) |
min.node.size |
scalar, minimum node size, i.e. leaf size of a tree (default is 5 observations) |
replace |
logical, if TRUE sampling with replacement, i.e. bootstrap is used to grow the trees, otherwise subsampling without replacement is used (default is set to FALSE) |
sample.fraction |
scalar, subsampling rate (default is 1 for bootstrap and 0.5 for subsampling) |
honesty |
logical, if TRUE honest forest is built using sample splitting (default is set to TRUE) |
honesty.fraction |
scalar, share of observations belonging to honest sample not used for growing the forest (default is 0.5) |
inference |
logical, if TRUE the weight based inference is conducted (default is set to FALSE) |
importance |
logical, if TRUE variable importance measure based on permutation is conducted (default is set to FALSE) |
The Ordered Forest function, orf
, estimates the conditional ordered choice
probabilities, i.e. P[Y=m|X=x]. Additionally, weight-based inference for
the probability predictions can be conducted as well. If inference is desired,
the Ordered Forest must be estimated with honesty and subsampling.
If prediction only is desired, estimation without honesty and with bootstrapping
is recommended for optimal prediction performance.
In order to estimate the Ordered Forest user must supply the data in form of
matrix of covariates X
and a vector of outcomes 'codeY to the orf
function. These data inputs are also the only inputs that must be specified by
the user without any defaults. Further optional arguments include the classical forest
hyperparameters such as number of trees, num.trees
, number of randomly
selected features, mtry
, and the minimum leaf size, min.node.size
.
The forest building scheme is regulated by the replace
argument, meaning
bootstrapping if replace = TRUE
or subsampling if replace = FALSE
.
For the case of subsampling, sample.fraction
argument regulates the subsampling
rate. Further, honest forest is estimated if the honesty
argument is set to
TRUE
, which is also the default. Similarly, the fraction of the sample used
for the honest estimation is regulated by the honesty.fraction
argument.
The default setting conducts a 50:50 sample split, which is also generally advised
to follow for optimal performance. Inference procedure of the Ordered Forest is based on
the forest weights and is controlled by the inference
argument. Note, that
such weight-based inference is computationally demanding exercise due to the estimation
of the forest weights and as such longer computation time is to be expected. Lastly,
the importance
argument turns on and off the permutation based variable
importance.
orf
is compatible with standard R
commands such as
predict
, margins
, plot
, summary
and print
.
For further details, see examples below.
object of type orf
with following elements
forests |
saved forests trained for |
info |
info containing forest inputs and data used |
predictions |
predicted values for class probabilities |
variances |
variances of predicted values |
importance |
weighted measure of permutation based variable importance |
accuracy |
oob measures for mean squared error and ranked probability score |
Gabriel Okasa
Lechner, M., & Okasa, G. (2019). Random Forest Estimation of the Ordered Choice Model. arXiv preprint arXiv:1907.02436. https://arxiv.org/abs/1907.02436
Goller, D., Knaus, M. C., Lechner, M., & Okasa, G. (2021). Predicting Match Outcomes in Football by an Ordered Forest Estimator. A Modern Guide to Sports Economics. Edward Elgar Publishing, 335-355. doi: 10.4337/9781789906530.00026
Wright, M. N. & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1-17. doi: 10.18637/jss.v077.i01.
summary.orf
, plot.orf
predict.orf
, margins.orf
## Ordered Forest require(orf) # load example data data(odata) # specify response and covariates Y <- as.numeric(odata[, 1]) X <- as.matrix(odata[, -1]) # estimate Ordered Forest with default parameters orf_fit <- orf(X, Y) # estimate Ordered Forest with own tuning parameters orf_fit <- orf(X, Y, num.trees = 2000, mtry = 3, min.node.size = 10) # estimate Ordered Forest with bootstrapping and without honesty orf_fit <- orf(X, Y, replace = TRUE, honesty = FALSE) # estimate Ordered Forest with subsampling and with honesty orf_fit <- orf(X, Y, replace = FALSE, honesty = TRUE) # estimate Ordered Forest with subsampling and with honesty # with own tuning for subsample fraction and honesty fraction orf_fit <- orf(X, Y, replace = FALSE, sample.fraction = 0.5, honesty = TRUE, honesty.fraction = 0.5) # estimate Ordered Forest with subsampling and with honesty and with inference # (for inference, subsampling and honesty are required) orf_fit <- orf(X, Y, replace = FALSE, honesty = TRUE, inference = TRUE) # estimate Ordered Forest with simple variable importance measure orf_fit <- orf(X, Y, importance = TRUE) # estimate Ordered Forest with all custom settings orf_fit <- orf(X, Y, num.trees = 2000, mtry = 3, min.node.size = 10, replace = TRUE, sample.fraction = 1, honesty = FALSE, honesty.fraction = 0, inference = FALSE, importance = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.