GittinsBrezziLaiPolicy: Policy: Gittins Approximation algorithm for choosing arms in...
In Nth-iteration-labs/contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

Description Details Usage Arguments Methods References See Also

GittinsBrezziLaiPolicy Algorithm based on Brezzi and Lai (2002) "Optimal learning and experimentation in bandit problems."

The algorithm provides an approximation of the Gittins index, by specifying a closed-form expression, which is a function of the discount factor, and the number of successes and failures associated with each arm.

1	policy <- GittinsBrezziLaiPolicy$new(discount=0.95, prior=NULL)

discount: numeric; discount factor
prior: numeric matrix; prior beliefs over Bernoulli parameters governing each arm. Beliefs are specified by Beta distribution with two parameters (alpha,beta) where alpha = number of success, beta = number of failures. Matrix is of arms times two (alpha / beta) dimensions

new(discount=0.95, prior=NULL)

Generates and initializes a new Policy object.

get_action(t, context)

arguments:

t: integer, time step t.
context: list, containing the current context$X (d x k context matrix), context$k (number of arms) and context$d (number of context features)

computes which arm to play based on the current values in named list theta and the current context. Returns a named list containing action$choice, which holds the index of the arm to play.

set_reward(t, context, action, reward)

arguments:

t: integer, time step t.
context: list, containing the current context$X (d x k context matrix), context$k (number of arms) and context$d (number of context features) (as set by bandit).
action: list, containing action$choice (as set by policy).
reward: list, containing reward$reward and, if available, reward$optimal (as set by bandit).

utilizes the above arguments to update and return the set of parameters in list theta.

set_parameters()

Helper function, called during a Policy's initialisation, assigns the values it finds in list self$theta_to_arms to each of the Policy's k arms. The parameters defined here can then be accessed by arm index in the following way: theta[[index_of_arm]]$parameter_name.

Brezzi, M., & Lai, T. L. (2002). Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control, 27(1), 87-108.

Implementation follows https://github.com/elarry/bandit-algorithms-simulated

Core contextual classes: Bandit, Policy, Simulator, Agent, History, Plot

Bandit subclass examples: BasicBernoulliBandit, ContextualLogitBandit, OfflineReplayEvaluatorBandit

Policy subclass examples: EpsilonGreedyPolicy, ContextualLinTSPolicy

Nth-iteration-labs/contextual documentation built on July 28, 2020, 1:13 p.m.

Nth-iteration-labs/contextual index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Nth-iteration-labs/contextual
Simulation and Analysis of Contextual Multi-Armed Bandit Policies

GittinsBrezziLaiPolicy: Policy: Gittins Approximation algorithm for choosing arms in...
In Nth-iteration-labs/contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

Description

Details

Usage

Arguments

Methods

References

See Also

Related to GittinsBrezziLaiPolicy in Nth-iteration-labs/contextual...

R Package Documentation

Browse R Packages

We want your feedback!

Nth-iteration-labs/contextual Simulation and Analysis of Contextual Multi-Armed Bandit Policies

GittinsBrezziLaiPolicy: Policy: Gittins Approximation algorithm for choosing arms in... In Nth-iteration-labs/contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

Description

Details

Usage

Arguments

Methods

References

See Also

Related to GittinsBrezziLaiPolicy in Nth-iteration-labs/contextual...

R Package Documentation

Browse R Packages

We want your feedback!

Nth-iteration-labs/contextual
Simulation and Analysis of Contextual Multi-Armed Bandit Policies

GittinsBrezziLaiPolicy: Policy: Gittins Approximation algorithm for choosing arms in...
In Nth-iteration-labs/contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies