ordered_ml | R Documentation |
Estimation strategy to estimate conditional choice probabilities for ordered non-numeric outcomes.
ordered_ml(Y = NULL, X = NULL, learner = "forest", scale = TRUE)
Y |
Outcome vector. |
X |
Covariate matrix (no intercept). |
learner |
String, either |
scale |
Logical, whether to scale the covariates. Ignored if |
Ordered machine learning expresses conditional choice probabilities as the difference between the cumulative probabilities of two adjacent classes, which in turn can be expressed as conditional expectations of binary variables:
p_m \left( X_i \right) = \mathbb{E} \left[ 1 \left( Y_i \leq m \right) | X_i \right] - \mathbb{E} \left[ 1 \left( Y_i \leq m - 1 \right) | X_i \right]
Then we can separately estimate each expectation using any regression algorithm and pick the difference between the m-th and the
(m-1)-th estimated surfaces to estimate conditional probabilities.
ordered_ml
combines this strategy with either regression forests or penalized logistic regressions with an L1 penalty,
according to the user-specified parameter learner
.
If learner == "forest"
, then the orf
function is called from an external package, as this estimator has already been proposed by Lechner and Okasa (2019).
If learner == "l1"
,
the penalty parameters are chosen via 10-fold cross-validation and model.matrix
is used to handle non-numeric covariates.
Additionally, if scale == TRUE
, the covariates are scaled to have zero mean and unit variance.
Object of class oml
.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/07474938.2024.2429596")}.
multinomial_ml
, ocf
## Generate synthetic data.
set.seed(1986)
data <- generate_ordered_data(100)
sample <- data$sample
Y <- sample$Y
X <- sample[, -1]
## Training-test split.
train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5))
Y_tr <- Y[train_idx]
X_tr <- X[train_idx, ]
Y_test <- Y[-train_idx]
X_test <- X[-train_idx, ]
## Fit ordered machine learning on training sample using two different learners.
ordered_forest <- ordered_ml(Y_tr, X_tr, learner = "forest")
ordered_l1 <- ordered_ml(Y_tr, X_tr, learner = "l1")
## Predict out of sample.
predictions_forest <- predict(ordered_forest, X_test)
predictions_l1 <- predict(ordered_l1, X_test)
## Compare predictions.
cbind(head(predictions_forest), head(predictions_l1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.