SimulateMultipleMethods: Compare various strategies for Multi-Armed Bandit in...
In google/MAB: Multi-Armed Bandit Strategies Implementation and Simulation

Description Usage Arguments Value Examples

This function is aimed to simulate data in different scenarios to compare various strategies in Multi-Armed Bandit. Users can specify the distribution of the number of arms, the distribution of mean reward, the distribution of the number of pulls in one period and the stationariness to simulate different scenarios. Relative regret is returned and average relative regret plot is returned if needed. See SimulateMultiplePeriods for more details.

SimulateMultipleMethods(method = "Thompson-Sampling",
  method.par = list(ndraws.TS = 1000), iter, nburnin, nperiod,
  reward.mean.family, reward.family, narms.family, npulls.family,
  stationary = TRUE, nonstationary.type = NULL, data.par,
  regret.plot = FALSE)

`method`	A vector of character strings choosing from "Epsilon-Greedy", "Epsilon-Decreasing", "Thompson-Sampling", "EXP3", "UCB", "Bayes-Poisson-TS", "Greedy-Thompson-Sampling", "EXP3-Thompson-Sampling", "Greedy-Bayes-Poisson-TS", "EXP3-Bayes-Poisson-TS" and "HyperTS". See `SimulateMultiplePeriods` for more details. Default is "Thompson-Sampling".
`method.par`	A list of parameters needed for different methods: `epsilon`: A real number between 0 and 1; needed for "Epsilon-Greedy", "Epsilon-Decreasing", "Greedy-Thompson-Sampling" and "Greedy-Bayes-Poisson-TS". `ndraws.TS`: A positive integer specifying the number of random draws from the posterior; needed for "Thompson-Sampling", "Greedy-Thompson-Sampling" and "EXP3-Thompson-Sampling". Default is 1000. `EXP3`: A list consisting of two real numbers `eta` and `gamma`; eta > 0 and 0 <= gamma < 1; needed for "EXP3", "EXP3-Thompson-Sampling" and "EXP3-Bayes-Poisson-TS". `BP`: A list consisting of three postive integers `iter.BP`, `ndraws.BP` and `interval.BP`; needed for "Bayes-Poisson-TS", "Greedy-Bayes-Poisson-TS" and "EXP3-Bayes-Poisson-TS"; `iter.BP` specifies the number of iterations to compute posterior; `ndraws.BP` specifies the number of posterior samples drawn from posterior distribution; `interval.BP` is specified to draw each posterior sample from a sample sequence of length `interval.BP`. `HyperTS`: A list consisting of a vector `method.list`, needed for "HyperTS". `method.list` is a vector of character strings choosing from "Epsilon-Greedy", "Epsilon-Decreasing", "Thompson-Sampling", "EXP3", "UCB", "Bayes-Poisson-TS", "Greedy-Thompson-Sampling", "EXP3-Thompson-Sampling", "Greedy-Bayes-Poisson-TS" and "EXP3-Bayes-Poisson-TS". "HyperTS" will construct an ensemble consisting all the methods in `method.list`.
`iter`	A positive integer specifying the number of iterations.
`nburnin`	A positive integer specifying the number of periods to allocate each arm equal traffic before applying any strategy.
`nperiod`	A positive integer specifying the number of periods to apply various strategies.
`reward.mean.family`	A character string specifying the distribution family to generate mean reward of each arm. Available distribution includes "Uniform", "Beta" and "Gaussian".
`reward.family`	A character string specifying the distribution family of reward. Available distribution includes "Bernoulli", "Poisson" and "Gaussian". If "Gaussian" is chosen to be the reward distribution, a vector of standard deviation should be provided in `sd.reward` in `data.par`.
`narms.family`	A character string specifying the distribution family of the number of arms. Available distribution includes "Poisson" and "Binomial".
`npulls.family`	A character string specifying the distribution family of the number of pulls per period. For continuous distribution, the number of pulls will be rounded up. Available distribution includes "Log-Normal" and "Poisson".
`stationary`	A logic value indicating whether a stationary Multi-Armed Bandit is considered (corresponding to the case that the reward mean is unchanged). Default to be TRUE.
`nonstationary.type`	A character string indicating how the mean reward varies. Available types include "Random Walk" and "Geometric Random Walk" (reward mean follows random walk in the log scale). Default to be NULL.
`data.par`	A list of data generating parameters: `reward.mean`: A list of parameters of `reward.mean.family`: `min` and `max` are two real numbers specifying the bounds when reward.mean.family = "Uniform"; `shape1` and `shape2` are two shape parameters when reward.mean.family = "Beta"; `mean` and `sd` specify mean and standard deviation when reward.mean.family = "Gaussian". `reward.family`: A list of parameters of `reward.family`: `sd` is a vector of non-negative numbers specifying standard deviation of each arm's reward distribution if "Gaussian" is chosen to be the reward distribution. `narms.family`: A list of parameters of `narms.family`: `lambda` is a positive parameter specifying the mean when narms.family = "Poisson"; `size` and `prob` are 2 parameters needed when narms.family = "Binomial". `npulls.family`: A list of parameters of `npulls.family`: `meanlog` and `sdlog` are 2 positive parameters specifying the mean and standard deviation in the log scale when npulls.family = "Log-Normal"; `lambda` is a positive parameter specifying the mean when npulls.family = "Poisson". `nonstationary.family`: A list of parameters of `nonstationary.type`: `sd` is a positive parameter specifying the standard deviation of white noise when nonstationary.type = "Random Walk"; `sdlog` is a positive parameter specifying the log standard deviation of white noise when nonstationary.type = "Geometric Random Walk".
`regret.plot`	A logic value indicating whether an average regret plot is returned. Default to be FALSE.

a list consisting of:

`regret.matrix`	A three-dimensional array with each dimension corresponding to the period, iteration and method.
`regret.plot.object`	If regret.plot = TRUE, a ggplot object is returned.

### Compare Epsilon-Greedy and Thompson Sampling in the stationary case.
set.seed(100)
res <- SimulateMultipleMethods(
           method = c("Epsilon-Greedy", "Thompson-Sampling"),
           method.par = list(epsilon = 0.1, ndraws.TS = 1000),
           iter = 100,
           nburnin = 30,
           nperiod = 180,
           reward.mean.family = "Uniform",
           reward.family = "Bernoulli",
           narms.family = "Poisson",
           npulls.family = "Log-Normal",
           data.par = list(reward.mean = list(min = 0, max = 0.1),
                           npulls.family = list(meanlog = 3, sdlog = 1.5),
                           narms.family = list(lambda = 5)),
           regret.plot = TRUE)
res$regret.plot.object
### Compare Epsilon-Greedy, Thompson Sampling and EXP3 in the non-stationary case.
set.seed(100)
res <- SimulateMultipleMethods(
           method = c("Epsilon-Greedy", "Thompson-Sampling", "EXP3"),
           method.par = list(epsilon = 0.1,
                             ndraws.TS = 1000,
                             EXP3 = list(gamma = 0, eta = 0.1)),
           iter = 100,
           nburnin = 30,
           nperiod = 90,
           reward.mean.family = "Beta",
           reward.family = "Bernoulli",
           narms.family = "Binomial",
           npulls.family = "Log-Normal",
           stationary = FALSE,
           nonstationary.type = "Geometric Random Walk",
           data.par = list(reward.mean = list(shape1 = 2, shape2 = 5),
                           npulls.family = list(meanlog = 3, sdlog = 1),
                           narms.family = list(size = 10, prob = 0.5),
                           nonstationary.family = list(sdlog = 0.05)),
           regret.plot = TRUE)
res$regret.plot.object