Bootstrap Thompson Sampling
Bootstrap Thompson Sampling (BTS) is a heuristic method for solving bandit problems which modifies Thompson Sampling (see ThompsonSamplingPolicy) by replacing the posterior distribution used in Thompson sampling by a bootstrap distribution.
new(J = 100, a= 1, b = 1)
Generates a new
Arguments are defined in the Argument section above.
each policy needs to assign the parameters it wants to keep track of
self$theta_to_arms that has to be defined in
The parameters defined here can later be accessed by arm index in the following way:
here, a policy decides which arm to choose, based on the current values of its parameters and, potentially, the current context.
set_reward(reward, context), a policy updates its parameter values
based on the reward received, and, potentially, the current context.
Eckles, D., & Kaptein, M. (2014). Thompson sampling with the online bootstrap. arXiv preprint arXiv:1410.4009.
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285-294.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.