Bandit: Bandit: Superclass

Description Details Usage Methods See Also


Parent or superclass of all {contextual} Bandit subclasses.


In {contextual}, Bandits are responsible for the generation of (either synthetic or offline) contexts and rewards.

On initialisation, a Bandit subclass has to define the number of arms self$k and the number of contextual feature dimensions self$d.

For each t = 1, ..., T a Bandit then generates a list containing current context in d x k dimensional matrix context$X, the number of arms in context$k and the number of features in context$d.

Note: in context-free scenario's, context$X can be omitted.

contextual diagram: get context

On receiving the index of a Policy-chosen arm through action$choice, Bandit is expected to return a named list containing at least reward$reward and, where computable, reward$optimal.

contextual diagram: get context





generates and instantializes a new Bandit instance.



  • t: integer, time step t.

returns a named list containing the current d x k dimensional matrix context$X, the number of arms context$k and the number of features context$d.

get_reward(t, context, action)


  • t: integer, time step t.

  • context: list, containing the current context$X (d x k context matrix), context$k (number of arms) and context$d (number of context features) (as set by bandit).

  • action: list, containing action$choice (as set by policy).

returns a named list containing reward$reward and, where computable, reward$optimal (used by "oracle" policies and to calculate regret).


Called after class and seed initialisation, but before the start of the simulation. Set random values that remain available throughout the life of a Bandit here.


Called after class and seed initialisation, but before the start of a simulation. Pregenerate contexts and rewards here.

See Also

Core contextual classes: Bandit, Policy, Simulator, Agent, History, Plot

Bandit subclass examples: BasicBernoulliBandit, ContextualLogitBandit, OfflineReplayEvaluatorBandit

Policy subclass examples: EpsilonGreedyPolicy, ContextualLinTSPolicy

robinvanemden/contextual documentation built on Aug. 12, 2019, 9:30 p.m.