BatchLinUCBDisjointPolicyEpsilon | R Documentation |
Batch Disjoint LinUCB Policy with Epsilon-Greedy
Batch Disjoint LinUCB Policy with Epsilon-Greedy
Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration, using batched updates.
- 'initialize(alpha = 1.0, epsilon = 0.1, batch_size = 1)': Constructor. - 'set_parameters(context_params)': Initializes sufficient statistics for each arm. - 'get_action(t, context)': Selects an arm using UCB scores and epsilon-greedy rule. - 'set_reward(t, context, action, reward)': Updates statistics and refreshes model at batch intervals.
cramR::NA
-> BatchLinUCBDisjointPolicyEpsilon
alpha
Numeric, UCB exploration strength parameter.
epsilon
Numeric, probability of taking a random exploratory action.
batch_size
Integer, number of rounds per batch update.
A_cc
List of Gram matrices per arm, accumulated across batch.
b_cc
List of reward-weighted context vectors per arm.
class_name
Internal class name identifier.
new()
Constructor for batched LinUCB with epsilon-greedy exploration.
BatchLinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1, batch_size = 1)
alpha
Numeric. UCB width parameter (exploration strength).
epsilon
Numeric. Probability of selecting a random arm.
batch_size
Integer. Number of rounds before updating parameters.
set_parameters()
Initialize arm-specific parameter containers.
BatchLinUCBDisjointPolicyEpsilon$set_parameters(context_params)
context_params
List containing at least 'unique' (feature size) and 'k' (number of arms).
get_action()
Chooses an arm based on UCB and epsilon-greedy sampling.
BatchLinUCBDisjointPolicyEpsilon$get_action(t, context)
t
Integer timestep.
context
List containing the context for the decision.
A list with the selected action.
set_reward()
Updates arm-specific sufficient statistics based on observed reward. Parameter updates occur only at the end of a batch.
BatchLinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)
t
Integer timestep.
context
Context object used for decision-making.
action
List containing the chosen action.
reward
List containing the observed reward.
Updated internal model parameters.
clone()
The objects of this class are cloneable with this method.
BatchLinUCBDisjointPolicyEpsilon$clone(deep = FALSE)
deep
Whether to make a deep clone.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.