ContextualLinTSPolicy: Policy: Linear Thompson Sampling with unique linear models
In contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

Description Usage Arguments Methods References See Also Examples

ContextualLinTSPolicy implements Thompson Sampling with Linear Payoffs, following Agrawal and Goyal (2011). Thompson Sampling with Linear Payoffs is a contextual Thompson Sampling multi-armed bandit Policy which assumes the underlying relationship between rewards and contexts are linear. Check the reference for more details.

1	policy <- ContextualLinTSPolicy$new(v = 0.2)

v: double, a positive real value R+; Hyper-parameter for adjusting the variance of posterior gaussian distribution.

new(v): instantiates a new ContextualLinTSPolicy instance. Arguments defined in the Arguments section above.

set_parameters(context_params): initialization of policy parameters, utilising context_params$k (number of arms) and context_params$d (number of context features).

get_action(t,context): selects an arm based on self$theta and context, returning the index of the selected arm in action$choice. The context argument consists of a list with context$k (number of arms), context$d (number of features), and the feature matrix context$X with dimensions d x k.

set_reward(t, context, action, reward): updates parameter list theta in accordance with the current reward$reward, action$choice and the feature matrix context$X with dimensions d x k. Returns the updated theta.

Shipra Agrawal, and Navin Goyal. "Thompson Sampling for Contextual Bandits with Linear Payoffs." Advances in Neural Information Processing Systems 24. 2011.

Core contextual classes: Bandit, Policy, Simulator, Agent, History, Plot

Bandit subclass examples: BasicBernoulliBandit, ContextualLogitBandit, OfflineReplayEvaluatorBandit

Policy subclass examples: EpsilonGreedyPolicy, ContextualLinTSPolicy

## Not run: 

horizon       <- 100L
simulations   <- 100L

bandit        <- ContextualLinearBandit$new(k = 4, d = 3, sigma = 0.3)

agents        <- list(Agent$new(EpsilonGreedyPolicy$new(0.1), bandit, "EGreedy"),
                      Agent$new(ContextualLinTSPolicyPolicy$new(0.1), bandit, "LinTSPolicy"))

simulation     <- Simulator$new(agents, horizon, simulations, do_parallel = TRUE)

history        <- simulation$run()

plot(history, type = "cumulative", rate = FALSE, legend_position = "topleft")


## End(Not run)