BatchContextualEpsilonGreedyPolicy | R Documentation |
Batch Contextual Epsilon-Greedy Policy
Batch Contextual Epsilon-Greedy Policy
Implements an epsilon-greedy exploration strategy for contextual bandits with batched updates.
cramR::NA
-> BatchContextualEpsilonGreedyPolicy
epsilon
Probability of selecting a random arm (exploration rate).
batch_size
Number of rounds per batch before updating model parameters.
A_cc
List of Gram matrices (one per arm), used to accumulate sufficient statistics across batches.
b_cc
List of reward-weighted context sums (one per arm), updated batch-wise.
class_name
Internal class name identifier.
new()
Constructor for the Batch Epsilon-Greedy policy.
BatchContextualEpsilonGreedyPolicy$new(epsilon = 0.1, batch_size = 1)
epsilon
Numeric between 0 and 1. Probability of random arm selection.
batch_size
Integer. Number of observations between parameter updates.
set_parameters()
Initializes the parameter structures for each arm.
BatchContextualEpsilonGreedyPolicy$set_parameters(context_params)
context_params
A list with at least 'd' (number of features) and 'k' (number of arms).
get_action()
Chooses an arm based on epsilon-greedy logic and the current estimates.
BatchContextualEpsilonGreedyPolicy$get_action(t, context)
t
Integer time step.
context
A list with contextual features and arm count.
A list with the selected action.
set_reward()
Updates model statistics based on observed reward. Updates occur once per batch.
BatchContextualEpsilonGreedyPolicy$set_reward(t, context, action, reward)
t
Integer time step.
context
List of contextual features used for the action.
action
A list with the chosen arm.
reward
A list with the observed reward.
Updated parameter estimates.
clone()
The objects of this class are cloneable with this method.
BatchContextualEpsilonGreedyPolicy$clone(deep = FALSE)
deep
Whether to make a deep clone.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.