Description Usage Arguments Details Value Examples
This function is aimed to simulate data to run strategies of Multi-Armed Bandit in a sequence of periods. Weight plot and regret plot are provided if needed. In each period there could be multiple pulls and each method can only be applied once. The default setting is that in each period there is only 1 pull, corresponding to continuous updating.
1 2 3 4 | SimulateMultiplePeriods(method = "Thompson-Sampling",
method.par = list(ndraws.TS = 1000), nburnin, nperiod, reward.family,
mean.reward, sd.reward = NULL, npulls.per.period = 1,
weight.plot = FALSE, regret.plot = FALSE)
|
method |
A character string choosing from "Epsilon-Greedy", "Epsilon-Decreasing", "Thompson-Sampling", "EXP3", "UCB", "Bayes-Poisson-TS", "Greedy-Thompson-Sampling", "EXP3-Thompson-Sampling", "Greedy-Bayes-Poisson-TS", "EXP3-Bayes-Poisson-TS" and "HyperTS". For details of these methods, see below. Default is "Thompson-Sampling". |
method.par |
A list of parameters needed for different methods:
|
nburnin |
A positive integer specifying the number of periods to allocate each arm equal traffic before applying any strategy. |
nperiod |
A positive integer specifying the number of periods to apply the strategy. |
reward.family |
A character string specifying the distribution family
of reward. Available distribution includes
"Bernoulli", "Poisson" and "Gaussian". If "Gaussian" is chosen to be the
reward distribution,
a vector of standard deviation should be provided in |
mean.reward |
A vector or a matrix of real numbers specifying the mean
reward of each arm. If |
sd.reward |
A vector of non-negative numbers specifying
standard deviation of each arm's reward distribution if "Gaussian" is chosen
to be the reward distribution. Default to be NULL.
See |
npulls.per.period |
A positive integer or a vector of positive
integers. Default value is 1. If |
weight.plot |
A logic value with FALSE as default. If TRUE, weight plot object for each arm is returned. |
regret.plot |
A logic value with FALSE as default. If TRUE, relative regret plot object is returned. |
Various methods have been implemented. "Epsilon-Greedy" and
"Epsilon-Decreasing" allocates 1 - epsilon traffic to the arm which has
the largest average reward and equally distribute the traffic
to other arms. For "Epsilon-Greedy" epsilon in method.par
serves as
constant exploration rate . For "Epsilon-Decreasing" epsilon in
method.par
serves as exploration rate at period 1,
while in period t exploration rate is epsilon / t.
See https://en.wikipedia.org/wiki/Multi-armed_bandit#Approximate_solutions
for more details about these strategies.
"Thompson-Sampling" refers to Beta-Binomial Thompson Sampling using Beta(1, 1) as a prior. "Bayes-Poisson-TS" refers to Poisson-Gamma Thompson Sampling using a Bayesian Generalized Linear Mixed Effects Model to compute weights. "Bayes-Poisson-TS", "Greedy-Bayes-Poisson-TS" and "EXP3-Bayes-Poisson-TS" depends on the package "emre" to compute posterior distribution. For algorithm details, see the paper https://arxiv.org/abs/1602.00047.
UCB (Upper Confidence Bound) is a classical method for Multi-Armed Bandit.
For algorithm details, see the paper
http://personal.unileoben.ac.at/rortner/Pubs/UCBRev.pdf.
EXP3 is a method which needs to specify exploration rate gamma
and
exploitation rate eta
. For algorithm details, see the paper
https://cseweb.ucsd.edu/~yfreund/papers/bandits.pdf.
Ensemble methods are also implemented. "Greedy-Thompson-Sampling" and "Greedy-Bayes-Poisson-TS" allocate 1 - epsilon traffic to the arm corresponding to the largest Thompson sampling weight and allocate epsilon traffic corresponding to Thompson sampling weights. Instead of using average reward for each period to update weights in "EXP3", "EXP3-Thompson-Sampling" and "EXP3-Bayes-Poisson-TS" use Thompson sampling weights in the updating formula in "EXP3". "HyperTS" is an ensemble by applying Thompson Sampling to selecting the best method in each period based on previous performance. For algorithm details, see the paper http://yxjiang.github.io/paper/RecSys2014-ensemble-bandit.pdf.
To measure the performance. Regret is computed by summing over the products of the number of pulls on one arm at one period and the difference of the mean reward of that arm compared with the largest one. Relative regret is computed by dividing the regret of a certain method over the regret of the benchmark method that allocates equal weights to each arm throughout all the periods.
a list consisting of:
weight |
A weight matrix whose each element is the allocated weight for each arm and period. Each row represents one arm and each column represents one period. |
regret |
A relative regret vector whose each element is relative regret for each period. For definition of relative regret, see above. |
weight.plot.object |
If weight.plot = TRUE, a ggplot object is returned. |
regret.plot.object |
If regret.plot = TRUE, a ggplot object is returned. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | ### Simulate Thompson-Sampling
set.seed(100)
res <- SimulateMultiplePeriods(method = "Thompson-Sampling",
method.par = list(ndraws.TS = 1000),
nburnin = 30,
nperiod = 180,
npulls.per.period = 5,
reward.family = "Bernoulli",
mean.reward = runif(3, 0, 0.1),
weight.plot = TRUE)
res$weight.plot.object
### Simulate EXP3-Thompson-Sampling
set.seed(100)
res <- SimulateMultiplePeriods(
method = "EXP3-Thompson-Sampling",
method.par = list(ndraws.TS = 1000,
EXP3 = list(gamma = 0, eta = 0.1)),
nburnin = 30,
nperiod = 180,
npulls.per.period = 5,
reward.family = "Bernoulli",
mean.reward = runif(3, 0, 0.1),
weight.plot = TRUE)
res$weight.plot.object
### Simulate ensemble method HyperTS given "Thompson-Sampling", "Epsilon-Greedy" and "Epsilon-Decreasing"
set.seed(100)
res <- SimulateMultiplePeriods(
method = "HyperTS",
method.par = list(
ndraws.TS = 1000,
epsilon = 0.1,
HyperTS = list(method.list = c("Thompson-Sampling",
"Epsilon-Greedy",
"Epsilon-Decreasing"))),
nburnin = 30,
nperiod = 180,
npulls.per.period = 5,
reward.family = "Poisson",
mean.reward = runif(3, 0, 0.1),
weight.plot = TRUE)
res$weight.plot.object
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.