ocf: Ordered Correlation Forest
In ocf: Ordered Correlation Forest

View source: R/ocf.R

ocf	R Documentation

Ordered Correlation Forest

Description

Nonparametric estimator for ordered non-numeric outcomes. The estimator modifies a standard random forest splitting criterion to build a collection of forests, each estimating the conditional probability of a single class.

Usage

ocf(
  Y = NULL,
  X = NULL,
  honesty = FALSE,
  honesty.fraction = 0.5,
  inference = FALSE,
  alpha = 0.2,
  n.trees = 2000,
  mtry = ceiling(sqrt(ncol(X))),
  min.node.size = 5,
  max.depth = 0,
  replace = FALSE,
  sample.fraction = ifelse(replace, 1, 0.5),
  n.threads = 1
)

Arguments

`Y`	Outcome vector.
`X`	Covariate matrix (no intercept).
`honesty`	Whether to grow honest forests.
`honesty.fraction`	Fraction of honest sample. Ignored if `honesty = FALSE`.
`inference`	Whether to extract weights and compute standard errors. The weights extraction considerably slows down the routine. `honesty = TRUE` is required for valid inference.
`alpha`	Controls the balance of each split. Each split leaves at least a fraction `alpha` of observations in the parent node on each side of the split.
`n.trees`	Number of trees.
`mtry`	Number of covariates to possibly split at in each node. Default is the square root of the number of covariates.
`min.node.size`	Minimal node size.
`max.depth`	Maximal tree depth. A value of 0 corresponds to unlimited depth, 1 to "stumps" (one split per tree).
`replace`	If `TRUE`, grow trees on bootstrap subsamples. Otherwise, trees are grown on random subsamples drawn without replacement.
`sample.fraction`	Fraction of observations to sample.
`n.threads`	Number of threads. Zero corresponds to the number of CPUs available.

Value

Object of class ocf.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/07474938.2024.2429596")}.

Examples

## Generate synthetic data.
set.seed(1986)

data <- generate_ordered_data(100)
sample <- data$sample
Y <- sample$Y
X <- sample[, -1]

## Training-test split.
train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5))

Y_tr <- Y[train_idx]
X_tr <- X[train_idx, ]

Y_test <- Y[-train_idx]
X_test <- X[-train_idx, ]

## Fit ocf on training sample.
forests <- ocf(Y_tr, X_tr)

## We have compatibility with generic S3-methods.
print(forests)
summary(forests)
predictions <- predict(forests, X_test)
head(predictions$probabilities)
table(Y_test, predictions$classification)

## Compute standard errors. This requires honest forests.
honest_forests <- ocf(Y_tr, X_tr, honesty = TRUE, inference = TRUE)
head(honest_forests$predictions$standard.errors)

## Marginal effects.
me <- marginal_effects(forests, eval = "atmean")
print(me)
print(me, latex = TRUE)
plot(me)

## Compute standard errors. This requires honest forests.
honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE)
print(honest_me, latex = TRUE)
plot(honest_me)

ocf documentation built on April 4, 2025, 4:44 a.m.