orf | R Documentation |

An implementation of the Ordered Forest estimator as developed
in Lechner & Okasa (2019). The Ordered Forest flexibly
estimates the conditional probabilities of models with ordered
categorical outcomes (so-called ordered choice models).
Additionally to common machine learning algorithms the `orf`

package provides functions for estimating marginal effects as well
as statistical inference thereof and thus provides similar output
as in standard econometric models for ordered choice. The core
forest algorithm relies on the fast C++ forest implementation
from the `ranger`

package (Wright & Ziegler, 2017).

orf( X, Y, num.trees = 1000, mtry = NULL, min.node.size = NULL, replace = FALSE, sample.fraction = NULL, honesty = TRUE, honesty.fraction = NULL, inference = FALSE, importance = FALSE )

`X` |
numeric matrix of features |

`Y` |
numeric vector of outcomes |

`num.trees` |
scalar, number of trees in a forest, i.e. bootstrap replications (default is 1000 trees) |

`mtry` |
scalar, number of randomly selected features (default is the squared root of number of features, rounded up to the nearest integer) |

`min.node.size` |
scalar, minimum node size, i.e. leaf size of a tree (default is 5 observations) |

`replace` |
logical, if TRUE sampling with replacement, i.e. bootstrap is used to grow the trees, otherwise subsampling without replacement is used (default is set to FALSE) |

`sample.fraction` |
scalar, subsampling rate (default is 1 for bootstrap and 0.5 for subsampling) |

`honesty` |
logical, if TRUE honest forest is built using sample splitting (default is set to TRUE) |

`honesty.fraction` |
scalar, share of observations belonging to honest sample not used for growing the forest (default is 0.5) |

`inference` |
logical, if TRUE the weight based inference is conducted (default is set to FALSE) |

`importance` |
logical, if TRUE variable importance measure based on permutation is conducted (default is set to FALSE) |

The Ordered Forest function, `orf`

, estimates the conditional ordered choice
probabilities, i.e. P[Y=m|X=x]. Additionally, weight-based inference for
the probability predictions can be conducted as well. If inference is desired,
the Ordered Forest must be estimated with honesty and subsampling.
If prediction only is desired, estimation without honesty and with bootstrapping
is recommended for optimal prediction performance.

In order to estimate the Ordered Forest user must supply the data in form of
matrix of covariates `X`

and a vector of outcomes 'codeY to the `orf`

function. These data inputs are also the only inputs that must be specified by
the user without any defaults. Further optional arguments include the classical forest
hyperparameters such as number of trees, `num.trees`

, number of randomly
selected features, `mtry`

, and the minimum leaf size, `min.node.size`

.
The forest building scheme is regulated by the `replace`

argument, meaning
bootstrapping if `replace = TRUE`

or subsampling if `replace = FALSE`

.
For the case of subsampling, `sample.fraction`

argument regulates the subsampling
rate. Further, honest forest is estimated if the `honesty`

argument is set to
`TRUE`

, which is also the default. Similarly, the fraction of the sample used
for the honest estimation is regulated by the `honesty.fraction`

argument.
The default setting conducts a 50:50 sample split, which is also generally advised
to follow for optimal performance. Inference procedure of the Ordered Forest is based on
the forest weights and is controlled by the `inference`

argument. Note, that
such weight-based inference is computationally demanding exercise due to the estimation
of the forest weights and as such longer computation time is to be expected. Lastly,
the `importance`

argument turns on and off the permutation based variable
importance.

`orf`

is compatible with standard `R`

commands such as
`predict`

, `margins`

, `plot`

, `summary`

and `print`

.
For further details, see examples below.

object of type `orf`

with following elements

`forests` |
saved forests trained for |

`info` |
info containing forest inputs and data used |

`predictions` |
predicted values for class probabilities |

`variances` |
variances of predicted values |

`importance` |
weighted measure of permutation based variable importance |

`accuracy` |
oob measures for mean squared error and ranked probability score |

Gabriel Okasa

Lechner, M., & Okasa, G. (2019). Random Forest Estimation of the Ordered Choice Model. arXiv preprint arXiv:1907.02436. https://arxiv.org/abs/1907.02436

Goller, D., Knaus, M. C., Lechner, M., & Okasa, G. (2021). Predicting Match Outcomes in Football by an Ordered Forest Estimator. A Modern Guide to Sports Economics. Edward Elgar Publishing, 335-355. doi: 10.4337/9781789906530.00026

Wright, M. N. & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1-17. doi: 10.18637/jss.v077.i01.

`summary.orf`

, `plot.orf`

`predict.orf`

, `margins.orf`

## Ordered Forest require(orf) # load example data data(odata) # specify response and covariates Y <- as.numeric(odata[, 1]) X <- as.matrix(odata[, -1]) # estimate Ordered Forest with default parameters orf_fit <- orf(X, Y) # estimate Ordered Forest with own tuning parameters orf_fit <- orf(X, Y, num.trees = 2000, mtry = 3, min.node.size = 10) # estimate Ordered Forest with bootstrapping and without honesty orf_fit <- orf(X, Y, replace = TRUE, honesty = FALSE) # estimate Ordered Forest with subsampling and with honesty orf_fit <- orf(X, Y, replace = FALSE, honesty = TRUE) # estimate Ordered Forest with subsampling and with honesty # with own tuning for subsample fraction and honesty fraction orf_fit <- orf(X, Y, replace = FALSE, sample.fraction = 0.5, honesty = TRUE, honesty.fraction = 0.5) # estimate Ordered Forest with subsampling and with honesty and with inference # (for inference, subsampling and honesty are required) orf_fit <- orf(X, Y, replace = FALSE, honesty = TRUE, inference = TRUE) # estimate Ordered Forest with simple variable importance measure orf_fit <- orf(X, Y, importance = TRUE) # estimate Ordered Forest with all custom settings orf_fit <- orf(X, Y, num.trees = 2000, mtry = 3, min.node.size = 10, replace = TRUE, sample.fraction = 1, honesty = FALSE, honesty.fraction = 0, inference = FALSE, importance = FALSE)

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.