cram_simulation: Cram Policy Simulation
In cramR: Cram Method for Efficient Simultaneous Learning and Evaluation

cram_simulation

R Documentation

Cram Policy Simulation

Description

This function performs the cram method (simultaneous learning and evaluation) on simulation data, for which the data generation process (DGP) is known. The data generation process for X can be given directly as a function or induced by a provided dataset via row-wise bootstrapping. Results are averaged across Monte Carlo replicates for the given DGP.

Usage

cram_simulation(
  X = NULL,
  dgp_X = NULL,
  dgp_D,
  dgp_Y,
  batch,
  nb_simulations,
  nb_simulations_truth = NULL,
  sample_size,
  model_type = "causal_forest",
  learner_type = "ridge",
  alpha = 0.05,
  baseline_policy = NULL,
  parallelize_batch = FALSE,
  model_params = NULL,
  custom_fit = NULL,
  custom_predict = NULL,
  propensity = NULL
)

Arguments

`X`	Optional. A matrix or data frame of covariates for each sample inducing empirically the DGP for covariates.
`dgp_X`	Optional. A function to generate covariate data for simulations.
`dgp_D`	A vectorized function to generate binary treatment assignments for each sample.
`dgp_Y`	A vectorized function to generate the outcome variable for each sample given the treatment and covariates.
`batch`	Either an integer specifying the number of batches (which will be created by random sampling) or a vector of length equal to the sample size providing the batch assignment (index) for each individual in the sample.
`nb_simulations`	The number of simulations (Monte Carlo replicates) to run.
`nb_simulations_truth`	Optional. The number of additional simmulations (Monte Carlo replicates) beyond nb_simulations to use when calculating the true policy value difference (delta) and the true policy value (psi)
`sample_size`	The number of samples in each simulation.
`model_type`	The model type for policy learning. Options include `"causal_forest"`, `"s_learner"`, and `"m_learner"`. Default is `"causal_forest"`. Note: you can also set model_type to NULL and specify custom_fit and custom_predict to use your custom model.
`learner_type`	The learner type for the chosen model. Options include `"ridge"` for Ridge Regression, `"fnn"` for Feedforward Neural Network and `"caret"` for Caret. Default is `"ridge"`. if model_type is 'causal_forest', choose NULL, if model_type is 's_learner' or 'm_learner', choose between 'ridge', 'fnn' and 'caret'.
`alpha`	Significance level for confidence intervals. Default is 0.05 (95% confidence).
`baseline_policy`	A list providing the baseline policy (binary 0 or 1) for each sample. If `NULL`, defaults to a list of zeros with the same length as the number of rows in `X`.
`parallelize_batch`	Logical. Whether to parallelize batch processing (i.e. the cram method learns T policies, with T the number of batches. They are learned in parallel when parallelize_batch is TRUE vs. learned sequentially using the efficient data.table structure when parallelize_batch is FALSE, recommended for light weight training). Defaults to `FALSE`.
`model_params`	A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to `NULL`.
`custom_fit`	A custom, user-defined, function that outputs a fitted model given training data (allows flexibility). Defaults to `NULL`.
`custom_predict`	A custom, user-defined, function for making predictions given a fitted model and test data (allow flexibility). Defaults to `NULL`.
`propensity`	The propensity score model

Value

A list containing:

avg_proportion_treated: The average proportion of treated individuals across simulations.
avg_delta_estimate: The average delta estimate across simulations.
avg_delta_standard_error: The average standard error of delta estimates.
delta_empirical_bias: The empirical bias of delta estimates.
delta_empirical_coverage: The empirical coverage of delta confidence intervals.
avg_policy_value_estimate: The average policy value estimate across simulations.
avg_policy_value_standard_error: The average standard error of policy value estimates.
policy_value_empirical_bias: The empirical bias of policy value estimates.
policy_value_empirical_coverage: The empirical coverage of policy value confidence intervals.

Examples


set.seed(123)

# dgp_X <- function(n) {
#   data.table::data.table(
#     binary     = rbinom(n, 1, 0.5),
#     discrete   = sample(1:5, n, replace = TRUE),
#     continuous = rnorm(n)
#   )
# }

n <- 100

X_data <- data.table::data.table(
  binary     = rbinom(n, 1, 0.5),
  discrete   = sample(1:5, n, replace = TRUE),
  continuous = rnorm(n)
)


dgp_D <- function(X) rbinom(nrow(X), 1, 0.5)

dgp_Y <- function(D, X) {
  theta <- ifelse(
    X[, binary] == 1 & X[, discrete] <= 2,  # Group 1: High benefit
    1,
    ifelse(X[, binary] == 0 & X[, discrete] >= 4,  # Group 3: Negative benefit
           -1,
           0.1)  # Group 2: Neutral effect
  )
  Y <- D * (theta + rnorm(length(D), mean = 0, sd = 1)) +
    (1 - D) * rnorm(length(D))  # Outcome for untreated
  return(Y)
}

# Parameters
nb_simulations <- 100
nb_simulations_truth <- 200
batch <- 5

# Perform CRAM simulation
result <- cram_simulation(
  X = X_data,
  dgp_D = dgp_D,
  dgp_Y = dgp_Y,
  batch = batch,
  nb_simulations = nb_simulations,
  nb_simulations_truth = nb_simulations_truth,
  sample_size = 500
)

result$raw_results
result$interactive_table

cramR documentation built on Aug. 25, 2025, 1:12 a.m.