mdp_learning: mdp learning

Description Usage Arguments Value Examples

Description

Simulate learning under the mdp policy

Usage

1
2
3
mdp_learning(transition, reward, discount, model_prior = NULL, x0,
  Tmax = 20, true_transition, observation = NULL, a0 = 1,
  model_names = NA, ...)

Arguments

transition

list of transition matrices, one per model

reward

the utility matrix U(x,a) of being at state x and taking action a

discount

the discount factor (1 is no discounting)

model_prior

the prior belief over models, a numeric of length(transitions). Uniform by default

x0

initial state

Tmax

termination time for finite time calculation, ignored otherwise

true_transition

actual transition used to drive simulation.

observation

NULL by default, simulate perfect observations

a0

previous action before starting, irrelivant unless actions influence observations and true_observation is not null

model_names

optional vector of names for columns in model posterior distribution. Will be taken from names of transition list if none are provided here.

...

additional arguments to mdp_compute_policy

Value

a list, containing: data frame "df" with the state, action and a value at each time step in the simulation, and a data.frame "posterior", in which the t'th row shows the belief state at time t.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
source(system.file("examples/K_models.R", package="mdplearning"))
transition <- lapply(models, `[[`, "transition")
reward <- models[[1]]$reward

## example where true model is model 1
out <- mdp_learning(transition, reward, discount, x0 = 10,
                    Tmax = 20, true_transition = transition[[1]])
## Did we learn which one was the true model?
out$posterior[20,]

## Simulate MDP strategy under observation uncertainty
out <- mdp_learning(transition = transition, reward, discount, x0 = 10,
               true_transition = transition[[1]],
               Tmax = 20, observation = models[[1]]$observation)

boettiger-lab/mdplearning documentation built on May 13, 2019, 8:23 a.m.