SimulateMultipleMethods: Compare various strategies for Multi-Armed Bandit in...

Description Usage Arguments Value Examples

Description

This function is aimed to simulate data in different scenarios to compare various strategies in Multi-Armed Bandit. Users can specify the distribution of the number of arms, the distribution of mean reward, the distribution of the number of pulls in one period and the stationariness to simulate different scenarios. Relative regret is returned and average relative regret plot is returned if needed. See SimulateMultiplePeriods for more details.

Usage

1
2
3
4
5
SimulateMultipleMethods(method = "Thompson-Sampling",
  method.par = list(ndraws.TS = 1000), iter, nburnin, nperiod,
  reward.mean.family, reward.family, narms.family, npulls.family,
  stationary = TRUE, nonstationary.type = NULL, data.par,
  regret.plot = FALSE)

Arguments

method

A vector of character strings choosing from "Epsilon-Greedy", "Epsilon-Decreasing", "Thompson-Sampling", "EXP3", "UCB", "Bayes-Poisson-TS", "Greedy-Thompson-Sampling", "EXP3-Thompson-Sampling", "Greedy-Bayes-Poisson-TS", "EXP3-Bayes-Poisson-TS" and "HyperTS". See SimulateMultiplePeriods for more details. Default is "Thompson-Sampling".

method.par

A list of parameters needed for different methods:

epsilon: A real number between 0 and 1; needed for "Epsilon-Greedy", "Epsilon-Decreasing", "Greedy-Thompson-Sampling" and "Greedy-Bayes-Poisson-TS".

ndraws.TS: A positive integer specifying the number of random draws from the posterior; needed for "Thompson-Sampling", "Greedy-Thompson-Sampling" and "EXP3-Thompson-Sampling". Default is 1000.

EXP3: A list consisting of two real numbers eta and gamma; eta > 0 and 0 <= gamma < 1; needed for "EXP3", "EXP3-Thompson-Sampling" and "EXP3-Bayes-Poisson-TS".

BP: A list consisting of three postive integers iter.BP, ndraws.BP and interval.BP; needed for "Bayes-Poisson-TS", "Greedy-Bayes-Poisson-TS" and "EXP3-Bayes-Poisson-TS"; iter.BP specifies the number of iterations to compute posterior; ndraws.BP specifies the number of posterior samples drawn from posterior distribution; interval.BP is specified to draw each posterior sample from a sample sequence of length interval.BP.

HyperTS: A list consisting of a vector method.list, needed for "HyperTS". method.list is a vector of character strings choosing from "Epsilon-Greedy", "Epsilon-Decreasing", "Thompson-Sampling", "EXP3", "UCB", "Bayes-Poisson-TS", "Greedy-Thompson-Sampling", "EXP3-Thompson-Sampling", "Greedy-Bayes-Poisson-TS" and "EXP3-Bayes-Poisson-TS". "HyperTS" will construct an ensemble consisting all the methods in method.list.

iter

A positive integer specifying the number of iterations.

nburnin

A positive integer specifying the number of periods to allocate each arm equal traffic before applying any strategy.

nperiod

A positive integer specifying the number of periods to apply various strategies.

reward.mean.family

A character string specifying the distribution family to generate mean reward of each arm. Available distribution includes "Uniform", "Beta" and "Gaussian".

reward.family

A character string specifying the distribution family of reward. Available distribution includes "Bernoulli", "Poisson" and "Gaussian". If "Gaussian" is chosen to be the reward distribution, a vector of standard deviation should be provided in sd.reward in data.par.

narms.family

A character string specifying the distribution family of the number of arms. Available distribution includes "Poisson" and "Binomial".

npulls.family

A character string specifying the distribution family of the number of pulls per period. For continuous distribution, the number of pulls will be rounded up. Available distribution includes "Log-Normal" and "Poisson".

stationary

A logic value indicating whether a stationary Multi-Armed Bandit is considered (corresponding to the case that the reward mean is unchanged). Default to be TRUE.

nonstationary.type

A character string indicating how the mean reward varies. Available types include "Random Walk" and "Geometric Random Walk" (reward mean follows random walk in the log scale). Default to be NULL.

data.par

A list of data generating parameters:

reward.mean: A list of parameters of reward.mean.family: min and max are two real numbers specifying the bounds when reward.mean.family = "Uniform"; shape1 and shape2 are two shape parameters when reward.mean.family = "Beta"; mean and sd specify mean and standard deviation when reward.mean.family = "Gaussian".

reward.family: A list of parameters of reward.family: sd is a vector of non-negative numbers specifying standard deviation of each arm's reward distribution if "Gaussian" is chosen to be the reward distribution.

narms.family: A list of parameters of narms.family: lambda is a positive parameter specifying the mean when narms.family = "Poisson"; size and prob are 2 parameters needed when narms.family = "Binomial".

npulls.family: A list of parameters of npulls.family: meanlog and sdlog are 2 positive parameters specifying the mean and standard deviation in the log scale when npulls.family = "Log-Normal"; lambda is a positive parameter specifying the mean when npulls.family = "Poisson".

nonstationary.family: A list of parameters of nonstationary.type: sd is a positive parameter specifying the standard deviation of white noise when nonstationary.type = "Random Walk"; sdlog is a positive parameter specifying the log standard deviation of white noise when nonstationary.type = "Geometric Random Walk".

regret.plot

A logic value indicating whether an average regret plot is returned. Default to be FALSE.

Value

a list consisting of:

regret.matrix

A three-dimensional array with each dimension corresponding to the period, iteration and method.

regret.plot.object

If regret.plot = TRUE, a ggplot object is returned.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
### Compare Epsilon-Greedy and Thompson Sampling in the stationary case.
set.seed(100)
res <- SimulateMultipleMethods(
           method = c("Epsilon-Greedy", "Thompson-Sampling"),
           method.par = list(epsilon = 0.1, ndraws.TS = 1000),
           iter = 100,
           nburnin = 30,
           nperiod = 180,
           reward.mean.family = "Uniform",
           reward.family = "Bernoulli",
           narms.family = "Poisson",
           npulls.family = "Log-Normal",
           data.par = list(reward.mean = list(min = 0, max = 0.1),
                           npulls.family = list(meanlog = 3, sdlog = 1.5),
                           narms.family = list(lambda = 5)),
           regret.plot = TRUE)
res$regret.plot.object
### Compare Epsilon-Greedy, Thompson Sampling and EXP3 in the non-stationary case.
set.seed(100)
res <- SimulateMultipleMethods(
           method = c("Epsilon-Greedy", "Thompson-Sampling", "EXP3"),
           method.par = list(epsilon = 0.1,
                             ndraws.TS = 1000,
                             EXP3 = list(gamma = 0, eta = 0.1)),
           iter = 100,
           nburnin = 30,
           nperiod = 90,
           reward.mean.family = "Beta",
           reward.family = "Bernoulli",
           narms.family = "Binomial",
           npulls.family = "Log-Normal",
           stationary = FALSE,
           nonstationary.type = "Geometric Random Walk",
           data.par = list(reward.mean = list(shape1 = 2, shape2 = 5),
                           npulls.family = list(meanlog = 3, sdlog = 1),
                           narms.family = list(size = 10, prob = 0.5),
                           nonstationary.family = list(sdlog = 0.05)),
           regret.plot = TRUE)
res$regret.plot.object

google/MAB documentation built on May 23, 2019, 9:01 p.m.