CalculateWeight: Calculate the probability of pulling each arm in the next...

Description Usage Arguments Value Examples

Description

This function is aimed to compute the probability of pulling each arm for various methods in Multi-Armed Bandit given the total reward and the number of trials for each arm.

Usage

1
2
3
CalculateWeight(method = "Thompson-Sampling", method.par = list(ndraws.TS =
  1000), all.event, reward.family, sd.reward = NULL, period = 1,
  EXP3Info = NULL)

Arguments

method

A character string choosing from "Epsilon-Greedy", "Epsilon-Decreasing", "Thompson-Sampling", "EXP3", "UCB", "Bayes-Poisson-TS", "Greedy-Thompson-Sampling", "EXP3-Thompson-Sampling", "Greedy-Bayes-Poisson-TS" and "EXP3-Bayes-Poisson-TS". See SimulateMultiplePeriods for more details. Default is "Thompson-Sampling".

method.par

A list of parameters needed for different methods:

epsilon: A real number between 0 and 1; needed for "Epsilon-Greedy", "Epsilon-Decreasing", "Greedy-Thompson-Sampling" and "Greedy-Bayes-Poisson-TS".

ndraws.TS: A positive integer specifying the number of random draws from the posterior; needed for "Thompson-Sampling", "Greedy-Thompson-Sampling" and "EXP3-Thompson-Sampling". Default is 1000.

EXP3: A list consisting of two real numbers eta and gamma; eta > 0 and 0 <= gamma < 1; needed for "EXP3", "EXP3-Thompson-Sampling" and "EXP3-Bayes-Poisson-TS".

BP: A list consisting of three postive integers iter.BP, ndraws.BP and interval.BP; needed for "Bayes-Poisson-TS", "Greedy-Bayes-Poisson-TS" and "EXP3-Bayes-Poisson-TS"; iter.BP specifies the number of iterations to compute posterior; ndraws.BP specifies the number of posterior samples drawn from posterior distribution; interval.BP is specified to draw each posterior sample from a sample sequence of length interval.BP.

all.event

A data frame containing two columns trial and reward with the number of rows equal to the number of arms. Each element of trial and reward represents the number of trials and the total reward for each arm respectively.

reward.family

A character string specifying the distribution family of reward. Available distribution includes "Bernoulli", "Poisson" and "Gaussian". If "Gaussian" is chosen to be the reward distribution, a vector of standard deviation should be provided in sd.reward.

sd.reward

A vector of non-negative numbers specifying standard deviation of each arm's reward distribution if "Gaussian" is chosen to be the reward distribution. Default to be NULL. See reward.family.

period

A positive integer specifying the period index. Default to be 1.

EXP3Info

A list of three vectors prevWeight, EXP3Trial and EXP3Reward with dimension equal to the number of arms, needed for "EXP3", "EXP3-Thompson-Sampling" and "EXP3-Bayes-Poisson-TS":

prevWeight: the weight vector in the previous EXP3 iteration.

EXP3Trial and EXP3Reward: vectors representing the number of trials and the total reward for each arm in the previous period respectively.

See SimulateMultiplePeriods for more details.

Value

A normalized weight vector for future randomized allocation.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
### Calculate weights using Thompson Sampling if reward follows Poisson
distribution.
set.seed(100)
CalculateWeight(method = "Thompson-Sampling",
                method.par = list(ndraws.TS = 1000),
                all.event = data.frame(reward = 1:3, trial = rep(10, 3)),
                reward.family = "Poisson")
### Calculate weights using EXP3
CalculateWeight(method = "EXP3",
                method.par = list(EXP3 = list(gamma = 0.01, eta =0.1)),
                all.event = data.frame(reward = 1:3, trial = rep(10, 3)),
                reward.family = "Bernoulli",
                EXP3Info = list(prevWeight = rep(1, 3),
                                EXP3Trial = rep(5, 3),
                                EXP3Reward = 0:2))

google/MAB documentation built on May 23, 2019, 9:01 p.m.