knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(morf)
In this tutorial, we show how to use morf
to estimate conditional choice probabilities and marginal effects, and conduct inference about these statistical targets. For illustration purposes, we use the synthetic data set provided in the orf
package:
## Load data from orf package. set.seed(1986) library(orf) data(odata) y <- as.numeric(odata[, 1]) X <- as.matrix(odata[, -1])
The morf
function constructs a collection of forests, one for each category of y
(three in this case). We can then use the forests to predict out-of-sample conditional probabilities using the predict
method. By default, predict
returns a matrix with the predicted probabilities and a vector of predicted class labels (each observation is labelled to the highest-probability class).
## Training-test split. train_idx <- sample(seq_len(length(y)), floor(length(y) * 0.5)) y_tr <- y[train_idx] X_tr <- X[train_idx, ] y_test <- y[-train_idx] X_test <- X[-train_idx, ] ## Fit morf on training sample. Use default settings. forests <- morf(y_tr, X_tr) ## Summary of data and tuning parameters. summary(forests) ## Out-of-sample predictions. predictions <- predict(forests, X_test) head(predictions$probabilities) table(y_test, predictions$classification)
We can also implement honesty, which is a necessary condition to get asymptotically normal and consistent predictions. However, honesty generally comes at the expense of a larger mean squared error. Thus, if inference is not of interest we recommend adaptive forests. In the following, we set honesty = TRUE
to construct honest forests.
## Honest forests. honest_forests <- morf(y_tr, X_tr, honesty = TRUE) honest_predictions <- predict(honest_forests, X_test) ## Compare predictions with adaptive fit. cbind(head(predictions$probabilities), head(honest_predictions$probabilities))
To extract the weights induced by each forest and estimate the standard errors we set inference = TRUE
. This requires also to set honesty = TRUE
, although in principle we could estimate standard errors also for adaptive forests. The point of implementing honesty is that it allows us to use the estimated standard errors to construct valid confidence intervals for the conditional probabilities. We again stress that if we care only about prediction performance we should set honesty = FALSE
(this is the default). As a final remark, notice that the weights extraction considerably slows down the routine. However, we can increase the number of threads used to construct the forests to speed up the routine.
## Compute standard errors. honest_forests <- morf(y_tr, X_tr, honesty = TRUE, inference = TRUE, n.threads = 0) # Zero corresponds to the number of CPUs available. head(honest_forests$predictions$standard.errors)
The package implements a nonparametric estimator of marginal effects. This can be used by calling the marginal_effects
function, which allows the estimation of mean marginal effects, marginal effects at the mean, or marginal effects at the median, according to the eval
argument. In the following, we construct our forests in the training sample and use them to estimate the marginal effects at the mean in the test sample. We use a larger number of trees here to get more stable results.
## Fit morf on training sample. Use large number of trees. forests <- morf(y_tr, X_tr, n.trees = 4000) ## Marginal effects at the mean on test sample. me_atmean <- marginal_effects(forests, data = X_test, eval = "atmean") summary(me_atmean)
As before, we can set inference = TRUE
to estimate the standard errors. Again, this requires the use of honest forests and considerably slows down the routine.
##Honest forests. honest_forests <- morf(y_tr, X_tr, n.trees = 4000, honesty = TRUE) # Notice we do not need inference here! ## Compute standard errors. honest_me_atmean <- marginal_effects(honest_forests, data = X_test , eval = "atmean", inference = TRUE) honest_me_atmean$standard.errors honest_me_atmean$p.values # These are not corrected for multiple hypotheses testing! ## LATEX. print(honest_me_atmean, latex = TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.