mdp_planning: mdp planning

Description Usage Arguments Value Examples

Description

Simulate MDP under a given policy

Usage

1
2
mdp_planning(transition, reward, discount, model_prior = NULL, x0,
  Tmax = 20, observation = NULL, a0 = 1, policy, ...)

Arguments

transition

list of transition matrices, one per model

reward

the utility matrix U(x,a) of being at state x and taking action a

discount

the discount factor (1 is no discounting)

model_prior

the prior belief over models, a numeric of length(transitions). Uniform by default

x0

initial state

Tmax

length of time to simulate

observation

NULL by default, simulate perfect observations

a0

previous action before starting, irrelivant unless actions influence observations and true_observation is not null

policy

a vector of length n_obs, whose i'th entry is the index of the optimal action given the system is in (observed) state i.

...

additional arguments to mdp_compute_policy

Value

a data frame "df" with the state, action and a value at each time step in the simulation

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
source(system.file("examples/K_models.R", package="mdplearning"))
transition <- lapply(models, `[[`, "transition")
reward <- models[[1]]$reward

df <- mdp_compute_policy(transition, reward, discount, model_prior = c(0.5, 0.5))
out <- mdp_planning(transition[[1]], reward, discount, x0 = 10,
               Tmax = 20, policy = df$policy)

## Simulate MDP strategy under observation uncertainty
out <- mdp_planning(transition[[1]], reward, discount, x0 = 10,
               Tmax = 20, policy = df$policy, 
               observation = models[[1]]$observation)

cboettig/mdplearning documentation built on May 13, 2019, 2:08 p.m.