yahoo_policy_ucb1_alpha_seg.R
In contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

#' @export
YahooUCB1AlphaSegPolicy <- R6::R6Class(
  portable = FALSE,
  class = FALSE,
  inherit = Policy,
  public = list(
    alpha = NULL,
    cluster = NULL,
    class_name = "YahooUCB1AlphaSegPolicy",
    initialize = function(alpha) {
      super$initialize()
      self$alpha                  <- alpha
    },
    set_parameters = function(context_params) {
      self$theta_to_arms          <- list('n' = rep(0,5), 'mean' = rep(0,5))
    },
    get_action = function(t, context) {
      # find the feature on which a user scores highest - that is this user's cluster
      self$cluster                <- which.max(head(context$X[context$unique,1],-1))
      local_arms                  <- context$arms
      for (arm in seq_along(local_arms)) {
        if(self$theta$n[[local_arms[arm]]][self$cluster] == 0) {
          action$choice             <- local_arms[arm]
          return(action)
        }
      }
      expected_rewards            <- rep(0.0, length(local_arms))
      for (arm in seq_along(local_arms)) {
        variance                  <- self$alpha / sqrt( self$theta$n[[local_arms[arm]]][self$cluster] )
        expected_rewards[arm]     <- self$theta$mean[[local_arms[arm]]][self$cluster] + variance
      }
      action$choice               <- local_arms[which_max_tied(expected_rewards)]
      action
    },
    set_reward = function(t, context, action, reward) {

      arm                                       <- action$choice
      reward                                    <- reward$reward
      self$theta$n[[arm]][self$cluster]         <- self$theta$n[[arm]][self$cluster] + 1
      self$theta$mean[[arm]][self$cluster]      <- self$theta$mean[[arm]][self$cluster] +
                                                    (reward - self$theta$mean[[arm]][self$cluster]) /
                                                    self$theta$n[[arm]][self$cluster]
      self$theta
    }
  )
)

Any scripts or data that you put into this service are public.

contextual documentation built on July 26, 2020, 1:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

contextual
Simulation and Analysis of Contextual Multi-Armed Bandit Policies

demo/replication_li_2010/demo_yahoo_classes/yahoo_policy_ucb1_alpha_seg.R
In contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

Try the contextual package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

contextual Simulation and Analysis of Contextual Multi-Armed Bandit Policies

demo/replication_li_2010/demo_yahoo_classes/yahoo_policy_ucb1_alpha_seg.R In contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

Try the contextual package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

contextual
Simulation and Analysis of Contextual Multi-Armed Bandit Policies

demo/replication_li_2010/demo_yahoo_classes/yahoo_policy_ucb1_alpha_seg.R
In contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies