generate_ordered_data: Generate Ordered Data

View source: R/utils.R

generate_ordered_dataR Documentation

Generate Ordered Data

Description

Generate a synthetic data set with an ordered non-numeric outcome, together with conditional probabilities and covariates' marginal effects.

Usage

generate_ordered_data(n)

Arguments

n

Sample size.

Details

First, a latent outcome is generated as follows:

Y_i^* = g ( X_i ) + \epsilon_i

with:

g ( X_i ) = X_i^T \beta

X_i := (X_{i, 1}, X_{i, 2}, X_{i, 3}, X_{i, 4}, X_{i, 5}, X_{i, 6})

X_{i, 1}, X_{i, 3}, X_{i, 5} \sim \mathcal{N} \left( 0, 1 \right)

X_{i, 2}, X_{i, 4}, X_{i, 6} \sim \textit{Bernoulli} \left( 0, 1 \right)

\beta = \left( 1, 1, 1/2, 1/2, 0, 0 \right)

\epsilon_i \sim logistic (0, 1)

Second, the observed outcomes are obtained by discretizing the latent outcome into three classes using uniformly spaced threshold parameters.

Third, the conditional probabilities and the covariates' marginal effects at the mean are generated using standard textbook formulas. Marginal effects are approximated using a sample of 1,000,000 observations.

Value

A list storing a data frame with the observed data, a matrix of true conditional probabilities, and a matrix of true marginal effects at the mean of the covariates.

Author(s)

Riccardo Di Francesco

References

  • Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/07474938.2024.2429596")}.

See Also

ocf

Examples

## Generate synthetic data.
set.seed(1986)

data <- generate_ordered_data(1000)

head(data$true_probs)
data$me_at_mean

sample <- data$sample
Y <- sample$Y
X <- sample[, -1]

## Fit ocf.
forests <- ocf(Y, X)
  

ocf documentation built on April 4, 2025, 4:44 a.m.