UCB_rejection_sampling: UCB_rejection_sampling
In manuclaeys/TimeSeriesBandits:

Description Usage Arguments Value Examples

View source: R/UCB_rejection_sampling.R

UCB algorithme with rejection sampling method Exclud any choices which not corresponds to real exepriments in dataset Stop if something is wrong. Generate a matrix to save the results (S).

At each iteration

Calculates the arm probabilities
Choose the arm with the maximum upper bound (with alpha parameter)
Receives a reward in visitor_reward for the arm and associated iteration
Updates the results matrix S.

Returns the calculation time. Review the estimated, actual averages and number of choices for each arm. See also ConditionForUCB, GenerateMatrixS, ProbaMaxForUCB and PlayArm. Require tic and toc from tictoc library

1 2	UCB_rejection_sampling(visitorReward, K = ncol(visitorReward), alpha = 1)

`K`	Integer value (optional)
`alpha`	Numeric value (optional)
`visitor_reward`	Dataframe of integer or numeric values

List of element:

S:numerical matrix of results ,
choice: choices of UCB,
proba: probability of the chosen arms,
time: time of cumputation,
theta_hat: mean estimated of each arm
theta: real mean of each arm

## Generates 10000 numbers from 2 binomial  distributions
set.seed(4434)
K1 <- rbinom(1000, 1, 0.6)
K2 <- rbinom(1000, 1, 0.7)
## Define a dataframe of rewards
visitor_reward <- as.data.frame(cbind(K1,K2) )
#remove data
temp_list <- sample(1:nrow(visitor_reward), 500, replace = FALSE, prob = NULL)
visitor_reward$K1[temp_list] <- NA
visitor_reward$K2[-temp_list] <- NA
#run ucb on missing data
ucb_alloc  <- UCB_rejection_sampling(visitor_reward,alpha = 10)