Quick Start with CRAM"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(pkgdown.max_print = Inf, width = 1000)
library(cramR)
library(data.table)
library(glmnet)
library(caret)

Introduction

The Cram package provides a unified framework for:

This vignette walks through these three core modules.


Cram User file

For reproducible use cases, see the example script provided in the Cram GitHub repository:

View user_cram.R on GitHub


1. cram_policy() β€” Binary Policy Learning & Evaluation

Generate Simulated Data

generate_data <- function(n) {
  X <- data.table(
    binary = rbinom(n, 1, 0.5),
    discrete = sample(1:5, n, replace = TRUE),
    continuous = rnorm(n)
  )
  D <- rbinom(n, 1, 0.5)
  treatment_effect <- ifelse(X$binary == 1 & X$discrete <= 2, 1,
                       ifelse(X$binary == 0 & X$discrete >= 4, -1, 0.1))
  Y <- D * (treatment_effect + rnorm(n)) + (1 - D) * rnorm(n)
  list(X = X, D = D, Y = Y)
}

set.seed(123)
data <- generate_data(1000)
X <- data$X; D <- data$D; Y <- data$Y

Run cram_policy() with causal forest

res <- cram_policy(
  X, D, Y,
  batch = 20,
  model_type = "causal_forest",
  learner_type = NULL,
  baseline_policy = as.list(rep(0, nrow(X))),
  alpha = 0.05
)
print(res)

Case of categorical target Y

Use caret and choose a classification method outputting probabilities i.e. using the key word classProbs = TRUE in trainControl, see the following as an example with a Random Forest Classifier:

r model_params <- list(formula = Y ~ ., caret_params = list(method = "rf", trControl = trainControl(method = "none", classProbs = TRUE))) Also note that all data inputs needs to be of numeric types, hence for Y categorical, it should contain numeric values representing the class of each observation. No need to use the type factor for cram_policy().

Custom Models with cram_policy()

Set model_params to NULL and specify custom_fit and custom_predict.

custom_fit <- function(X, Y, D, n_folds = 5) {
  treated <- which(D == 1); control <- which(D == 0)
  m1 <- cv.glmnet(as.matrix(X[treated, ]), Y[treated], alpha = 0, nfolds = n_folds)
  m0 <- cv.glmnet(as.matrix(X[control, ]), Y[control], alpha = 0, nfolds = n_folds)
  tau1 <- predict(m1, as.matrix(X[control, ]), s = "lambda.min") - Y[control]
  tau0 <- Y[treated] - predict(m0, as.matrix(X[treated, ]), s = "lambda.min")
  tau <- c(tau0, tau1); X_all <- rbind(X[treated, ], X[control, ])
  final_model <- cv.glmnet(as.matrix(X_all), tau, alpha = 0)
  final_model
}

custom_predict <- function(model, X, D) {
  as.numeric(predict(model, as.matrix(X), s = "lambda.min") > 0)
}

res <- cram_policy(
  X, D, Y,
  batch = 20,
  model_type = NULL,
  custom_fit = custom_fit,
  custom_predict = custom_predict
)
print(res)

2. cram_ml() β€” ML Learning & Evaluation

Regression with cram_ml()

Specify formula and caret_paramsconforming to the popular caret::train() and set an individual loss under loss_name.

set.seed(42)
data_df <- data.frame(
  x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100), Y = rnorm(100)
)

caret_params <- list(
  method = "lm",
  trControl = trainControl(method = "none")
)

res <- cram_ml(
  data = data_df,
  formula = Y ~ .,
  batch = 5,
  loss_name = "se",
  caret_params = caret_params
)
print(res)

Classification with cram_ml()

All data inputs needs to be of numeric types, hence for Y categorical, it should contain numeric values representing the class of each observation. No need to use the type factor for cram_ml().

Case 1: Predicting Class labels

In this case, the model outputs hard predictions (labels, e.g. 0, 1, 2 etc.), and the metric used is classification accuracyβ€”the proportion of correctly predicted labels.

set.seed(42)

# Generate binary classification dataset
X_data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100))
Y_data <- rbinom(nrow(X_data), 1, 0.5)
data_df <- data.frame(X_data, Y = Y_data)

# Define caret parameters: predict labels (default behavior)
caret_params_rf <- list(
  method = "rf",
  trControl = trainControl(method = "none")
)

# Run CRAM ML with accuracy as loss
result <- cram_ml(
  data = data_df,
  formula = Y ~ .,
  batch = 5,
  loss_name = "accuracy",
  caret_params = caret_params_rf,
  classify = TRUE
)

print(result)

Case 2: Predicting Class Probabilities

In this setup, the model outputs class probabilities, and the loss is evaluated using logarithmic loss (logloss)β€”a standard metric for probabilistic classification.

set.seed(42)

# Generate binary classification dataset
X_data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100))
Y_data <- rbinom(nrow(X_data), 1, 0.5)
data_df <- data.frame(X_data, Y = Y_data)

# Define caret parameters for probability output
caret_params_rf_probs <- list(
  method = "rf",
  trControl = trainControl(method = "none", classProbs = TRUE)
)

# Run CRAM ML with logloss as the evaluation loss
result <- cram_ml(
  data = data_df,
  formula = Y ~ .,
  batch = 5,
  loss_name = "logloss",
  caret_params = caret_params_rf_probs,
  classify = TRUE
)

print(result)

In addition to using built-in learners via caret, cram_ml() also supports fully custom model workflows. You can specify your own:

See the vignette "Cram ML" for more details.


3. cram_bandit() β€” Contextual Bandits for On-policy Statistical Evaluation

Specify:

set.seed(42)
T <- 100; K <- 4
pi <- array(runif(T * T * K, 0.1, 1), dim = c(T, T, K))
for (t in 1:T) for (j in 1:T) pi[j, t, ] <- pi[j, t, ] / sum(pi[j, t, ])
arm <- sample(1:K, T, replace = TRUE)
reward <- rnorm(T, 1, 0.5)

res <- cram_bandit(pi, arm, reward, batch=1, alpha=0.05)
print(res)

Summary

```r autograph_files <- list.files(tempdir(), pattern = "^__autograph_generated_file.*\.py$", full.names = TRUE) if (length(autograph_files) > 0) { try(unlink(autograph_files, recursive = TRUE, force = TRUE), silent = TRUE) }



Try the cramR package in your browser

Any scripts or data that you put into this service are public.

cramR documentation built on Aug. 25, 2025, 1:12 a.m.