CalculateWeight: Calculate the probability of pulling each arm in the next...
In google/MAB: Multi-Armed Bandit Strategies Implementation and Simulation

Description Usage Arguments Value Examples

This function is aimed to compute the probability of pulling each arm for various methods in Multi-Armed Bandit given the total reward and the number of trials for each arm.

1
2
3

CalculateWeight(method = "Thompson-Sampling", method.par = list(ndraws.TS =
  1000), all.event, reward.family, sd.reward = NULL, period = 1,
  EXP3Info = NULL)

`method`	A character string choosing from "Epsilon-Greedy", "Epsilon-Decreasing", "Thompson-Sampling", "EXP3", "UCB", "Bayes-Poisson-TS", "Greedy-Thompson-Sampling", "EXP3-Thompson-Sampling", "Greedy-Bayes-Poisson-TS" and "EXP3-Bayes-Poisson-TS". See `SimulateMultiplePeriods` for more details. Default is "Thompson-Sampling".
`method.par`	A list of parameters needed for different methods: `epsilon`: A real number between 0 and 1; needed for "Epsilon-Greedy", "Epsilon-Decreasing", "Greedy-Thompson-Sampling" and "Greedy-Bayes-Poisson-TS". `ndraws.TS`: A positive integer specifying the number of random draws from the posterior; needed for "Thompson-Sampling", "Greedy-Thompson-Sampling" and "EXP3-Thompson-Sampling". Default is 1000. `EXP3`: A list consisting of two real numbers `eta` and `gamma`; eta > 0 and 0 <= gamma < 1; needed for "EXP3", "EXP3-Thompson-Sampling" and "EXP3-Bayes-Poisson-TS". `BP`: A list consisting of three postive integers `iter.BP`, `ndraws.BP` and `interval.BP`; needed for "Bayes-Poisson-TS", "Greedy-Bayes-Poisson-TS" and "EXP3-Bayes-Poisson-TS"; `iter.BP` specifies the number of iterations to compute posterior; `ndraws.BP` specifies the number of posterior samples drawn from posterior distribution; `interval.BP` is specified to draw each posterior sample from a sample sequence of length `interval.BP`.
`all.event`	A data frame containing two columns `trial` and `reward` with the number of rows equal to the number of arms. Each element of `trial` and `reward` represents the number of trials and the total reward for each arm respectively.
`reward.family`	A character string specifying the distribution family of reward. Available distribution includes "Bernoulli", "Poisson" and "Gaussian". If "Gaussian" is chosen to be the reward distribution, a vector of standard deviation should be provided in `sd.reward`.
`sd.reward`	A vector of non-negative numbers specifying standard deviation of each arm's reward distribution if "Gaussian" is chosen to be the reward distribution. Default to be NULL. See `reward.family`.
`period`	A positive integer specifying the period index. Default to be 1.
`EXP3Info`	A list of three vectors `prevWeight`, `EXP3Trial` and `EXP3Reward` with dimension equal to the number of arms, needed for "EXP3", "EXP3-Thompson-Sampling" and "EXP3-Bayes-Poisson-TS": `prevWeight`: the weight vector in the previous EXP3 iteration. `EXP3Trial` and `EXP3Reward`: vectors representing the number of trials and the total reward for each arm in the previous period respectively. See `SimulateMultiplePeriods` for more details.

A normalized weight vector for future randomized allocation.

### Calculate weights using Thompson Sampling if reward follows Poisson
distribution.
set.seed(100)
CalculateWeight(method = "Thompson-Sampling",
                method.par = list(ndraws.TS = 1000),
                all.event = data.frame(reward = 1:3, trial = rep(10, 3)),
                reward.family = "Poisson")
### Calculate weights using EXP3
CalculateWeight(method = "EXP3",
                method.par = list(EXP3 = list(gamma = 0.01, eta =0.1)),
                all.event = data.frame(reward = 1:3, trial = rep(10, 3)),
                reward.family = "Bernoulli",
                EXP3Info = list(prevWeight = rep(1, 3),
                                EXP3Trial = rep(5, 3),
                                EXP3Reward = 0:2))