Description Usage Arguments Methods References See Also Examples
ContextualLinTSPolicy
implements Thompson Sampling with Linear
Payoffs, following Agrawal and Goyal (2011).
Thompson Sampling with Linear Payoffs is a contextual Thompson Sampling multi-armed bandit
Policy which assumes the underlying relationship between rewards and contexts
are linear. Check the reference for more details.
1 | policy <- ContextualLinTSPolicy$new(v = 0.2)
|
v
double, a positive real value R+; Hyper-parameter for adjusting the variance of posterior gaussian distribution.
new(v)
instantiates a new ContextualLinTSPolicy
instance.
Arguments defined in the Arguments section above.
set_parameters(context_params)
initialization of policy parameters, utilising context_params$k
(number of arms) and
context_params$d
(number of context features).
get_action(t,context)
selects an arm based on self$theta
and context
, returning the index of the selected arm
in action$choice
. The context argument consists of a list with context$k
(number of arms),
context$d
(number of features), and the feature matrix context$X
with dimensions
d x k.
set_reward(t, context, action, reward)
updates parameter list theta
in accordance with the current reward$reward
,
action$choice
and the feature matrix context$X
with dimensions
d x k. Returns the updated theta
.
Shipra Agrawal, and Navin Goyal. "Thompson Sampling for Contextual Bandits with Linear Payoffs." Advances in Neural Information Processing Systems 24. 2011.
Core contextual classes: Bandit
, Policy
, Simulator
,
Agent
, History
, Plot
Bandit subclass examples: BasicBernoulliBandit
, ContextualLogitBandit
,
OfflineReplayEvaluatorBandit
Policy subclass examples: EpsilonGreedyPolicy
, ContextualLinTSPolicy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ## Not run:
horizon <- 100L
simulations <- 100L
bandit <- ContextualLinearBandit$new(k = 4, d = 3, sigma = 0.3)
agents <- list(Agent$new(EpsilonGreedyPolicy$new(0.1), bandit, "EGreedy"),
Agent$new(ContextualLinTSPolicyPolicy$new(0.1), bandit, "LinTSPolicy"))
simulation <- Simulator$new(agents, horizon, simulations, do_parallel = TRUE)
history <- simulation$run()
plot(history, type = "cumulative", rate = FALSE, legend_position = "topleft")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.