BatchContextualEpsilonGreedyPolicy: Batch Contextual Epsilon-Greedy Policy
In cramR: Cram Method for Efficient Simultaneous Learning and Evaluation

BatchContextualEpsilonGreedyPolicy

R Documentation

Batch Contextual Epsilon-Greedy Policy

Description

Batch Contextual Epsilon-Greedy Policy

Details

Implements an epsilon-greedy exploration strategy for contextual bandits with batched updates.

Super class

cramR::NA -> BatchContextualEpsilonGreedyPolicy

Public fields

epsilon: Probability of selecting a random arm (exploration rate).
batch_size: Number of rounds per batch before updating model parameters.
A_cc: List of Gram matrices (one per arm), used to accumulate sufficient statistics across batches.
b_cc: List of reward-weighted context sums (one per arm), updated batch-wise.
class_name: Internal class name identifier.

Methods

Public methods

BatchContextualEpsilonGreedyPolicy$new()
BatchContextualEpsilonGreedyPolicy$set_parameters()
BatchContextualEpsilonGreedyPolicy$get_action()
BatchContextualEpsilonGreedyPolicy$set_reward()
BatchContextualEpsilonGreedyPolicy$clone()

Inherited methods

Method `new()`

Constructor for the Batch Epsilon-Greedy policy.

Usage

BatchContextualEpsilonGreedyPolicy$new(epsilon = 0.1, batch_size = 1)

Arguments

epsilon: Numeric between 0 and 1. Probability of random arm selection.
batch_size: Integer. Number of observations between parameter updates.

Method `set_parameters()`

Initializes the parameter structures for each arm.

Usage

BatchContextualEpsilonGreedyPolicy$set_parameters(context_params)

Arguments

context_params: A list with at least 'd' (number of features) and 'k' (number of arms).

Method `get_action()`

Chooses an arm based on epsilon-greedy logic and the current estimates.

Usage

BatchContextualEpsilonGreedyPolicy$get_action(t, context)

Arguments

t: Integer time step.
context: A list with contextual features and arm count.

Returns

A list with the selected action.

Method `set_reward()`

Updates model statistics based on observed reward. Updates occur once per batch.

Usage

BatchContextualEpsilonGreedyPolicy$set_reward(t, context, action, reward)

Arguments

t: Integer time step.
context: List of contextual features used for the action.
action: A list with the chosen arm.
reward: A list with the observed reward.

Returns

Updated parameter estimates.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

BatchContextualEpsilonGreedyPolicy$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

cramR documentation built on Aug. 25, 2025, 1:12 a.m.

cramR index

README.md Cram Bandit" Cram Bandit Helpers" Cram Bandit Simulation" Cram ML" Cram Policy part 2" Cram Policy Simulation" Introduction & Cram Policy part 1" Quick Start with CRAM"

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cramR
Cram Method for Efficient Simultaneous Learning and Evaluation

BatchContextualEpsilonGreedyPolicy: Batch Contextual Epsilon-Greedy Policy
In cramR: Cram Method for Efficient Simultaneous Learning and Evaluation