Description Usage Arguments Methods References See Also Examples
ContextualLinTSPolicy implements Thompson Sampling with Linear
Payoffs, following Agrawal and Goyal (2011).
Thompson Sampling with Linear Payoffs is a contextual Thompson Sampling multi-armed bandit
Policy which assumes the underlying relationship between rewards and contexts
are linear. Check the reference for more details.
1 | policy <- ContextualLinTSPolicy$new(v = 0.2)
|
vdouble, a positive real value R+; Hyper-parameter for adjusting the variance of posterior gaussian distribution.
new(v) instantiates a new ContextualLinTSPolicy instance.
Arguments defined in the Arguments section above.
set_parameters(context_params)initialization of policy parameters, utilising context_params$k (number of arms) and
context_params$d (number of context features).
get_action(t,context)selects an arm based on self$theta and context, returning the index of the selected arm
in action$choice. The context argument consists of a list with context$k (number of arms),
context$d (number of features), and the feature matrix context$X with dimensions
d x k.
set_reward(t, context, action, reward)updates parameter list theta in accordance with the current reward$reward,
action$choice and the feature matrix context$X with dimensions
d x k. Returns the updated theta.
Shipra Agrawal, and Navin Goyal. "Thompson Sampling for Contextual Bandits with Linear Payoffs." Advances in Neural Information Processing Systems 24. 2011.
Core contextual classes: Bandit, Policy, Simulator,
Agent, History, Plot
Bandit subclass examples: BasicBernoulliBandit, ContextualLogitBandit,
OfflineReplayEvaluatorBandit
Policy subclass examples: EpsilonGreedyPolicy, ContextualLinTSPolicy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ## Not run:
horizon <- 100L
simulations <- 100L
bandit <- ContextualLinearBandit$new(k = 4, d = 3, sigma = 0.3)
agents <- list(Agent$new(EpsilonGreedyPolicy$new(0.1), bandit, "EGreedy"),
Agent$new(ContextualLinTSPolicyPolicy$new(0.1), bandit, "LinTSPolicy"))
simulation <- Simulator$new(agents, horizon, simulations, do_parallel = TRUE)
history <- simulation$run()
plot(history, type = "cumulative", rate = FALSE, legend_position = "topleft")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.