demo/demo_offline_cmab_alpha_linucb_direct_method.R
In contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

library(contextual)
library(data.table)
library(Formula)

# Import personalization data-set

data         <- fread("http://d1ie9wlkzugsxr.cloudfront.net/data_cmab_basic/data.txt")
                                         # 0/1 reward, 10 arms, 100 features
                                         # arms always start from 1

#      z y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15  .. x100
#   1: 2 0  5  0  0 37  6  0  0  0  0  25   0   0   7   1   0  ..    0
#   2: 8 0  1  3 36  0  0  0  0  0  0   0   0   1   0   0   0  ..   10
#   3: . .  .  .  .  .  .  .  .  .  .   .   .   .   .   .   .  ..    .

simulations <- 1
horizon     <- nrow(data)

# Run regression per arm, predict outcomes, and save results, a column per arm

x                <- reformulate(names(data)[3:102],response="y")     # x: x1 .. x100
f                <- Formula::as.Formula(x)                           # y ~ x

model_f          <- function(arm) glm(f, data=data[z==arm], family=binomial(link="logit"), y=F, model=F)
arms             <- sort(unique(data$z))
model_arms       <- lapply(arms, FUN = model_f)

predict_arm      <- function(model) predict(model, data, type = "response")
r_data           <- lapply(model_arms, FUN = predict_arm)
r_data           <- do.call(cbind, r_data)
colnames(r_data) <- paste0("r", (1:max(arms)))

# Bind data and model predictions

data             <- cbind(data,r_data)

# Run direct method style offline bandit

x                <- reformulate(names(data)[3:102], response="y")
z                <- ~ z
r                <- ~ r1 + r2 + r3 + r4 + r5 + r6 + r7 + r8 + r9 + r10

f                <- as.Formula(z,x,r)    # Resulting in: y ~ z | x1 + x2 .. | r1 + r2 + ..

bandit           <- OfflineDirectMethodBandit$new(formula = f, data = data)

# Define agents.
agents      <- list(Agent$new(LinUCBDisjointOptimizedPolicy$new(0.01), bandit, "alpha = 0.01"),
                    Agent$new(LinUCBDisjointOptimizedPolicy$new(0.05), bandit, "alpha = 0.05"),
                    Agent$new(LinUCBDisjointOptimizedPolicy$new(0.1),  bandit, "alpha = 0.1"),
                    Agent$new(LinUCBDisjointOptimizedPolicy$new(1.0),  bandit, "alpha = 1.0"))

# Initialize the simulation.

simulation  <- Simulator$new(agents = agents, simulations = simulations, horizon = horizon)

# Run the simulation.
sim  <- simulation$run()

# plot the results
plot(sim, type = "cumulative", regret = FALSE, rate = TRUE, legend_position = "bottomright", ylim = c(0,1))

Any scripts or data that you put into this service are public.

contextual documentation built on July 26, 2020, 1:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

contextual
Simulation and Analysis of Contextual Multi-Armed Bandit Policies

demo/demo_offline_cmab_alpha_linucb_direct_method.R
In contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

Try the contextual package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

contextual Simulation and Analysis of Contextual Multi-Armed Bandit Policies

demo/demo_offline_cmab_alpha_linucb_direct_method.R In contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

Try the contextual package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

contextual
Simulation and Analysis of Contextual Multi-Armed Bandit Policies

demo/demo_offline_cmab_alpha_linucb_direct_method.R
In contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies