Agent: Agent
In contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

Description Details Schematic Usage Arguments Methods See Also Examples

Keeps track of one Bandit and Policy pair.

Controls the running of one Bandit and Policy pair over t = {1, ..., T} looping over, consecutively, bandit$get_context(), policy$get_action(), bandit$get_reward() and policy$set_reward() for each time step t.

contextual diagram: simulator

1	agent <- Agent$new(policy, bandit, name=NULL, sparse = 0.0)

policy: Policy instance.
bandit: Bandit instance.
name: character; sets the name of the Agent. If NULL (default), Agent generates a name based on its Policy instance's name.
sparse: numeric; artificially reduces the data size by setting a sparsity level for the current Bandit and Policy pair. When set to a value between 0.0 (default) and 1.0 only a fraction sparse of the Bandit's data is randomly chosen to be available to improve the Agent's Policy through policy$set_reward.

new(): generates and instantializes a new Agent instance.
do_step(): advances a simulation by one time step by consecutively calling bandit$get_context(), policy$get_action(), bandit$get_reward() and policy$set_reward(). Returns a list of lists containing context, action, reward and theta.
set_t(t): integer; sets the current time step to t.
get_t(): returns current time step t.

Core contextual classes: Bandit, Policy, Simulator, Agent, History, Plot

Bandit subclass examples: BasicBernoulliBandit, ContextualLogitBandit, OfflineReplayEvaluatorBandit

Policy subclass examples: EpsilonGreedyPolicy, ContextualLinTSPolicy

## Not run: 

  policy    <- EpsilonGreedyPolicy$new(epsilon = 0.1)
  bandit    <- BasicBernoulliBandit$new(weights = c(0.6, 0.1, 0.1))

  agent     <- Agent$new(policy, bandit, name = "E.G.", sparse = 0.5)

  history   <- Simulator$new(agents = agent,
                             horizon = 10,
                             simulations = 10)$run()

## End(Not run)