Description Usage Arguments Value Examples
Simulate MDP under a given policy
| 1 2 | mdp_planning(transition, reward, discount, model_prior = NULL, x0,
  Tmax = 20, observation = NULL, a0 = 1, policy, ...)
 | 
| transition | list of transition matrices, one per model | 
| reward | the utility matrix U(x,a) of being at state x and taking action a | 
| discount | the discount factor (1 is no discounting) | 
| model_prior | the prior belief over models, a numeric of length(transitions). Uniform by default | 
| x0 | initial state | 
| Tmax | length of time to simulate | 
| observation | NULL by default, simulate perfect observations | 
| a0 | previous action before starting, irrelivant unless actions influence observations and true_observation is not null | 
| policy | a vector of length n_obs, whose i'th entry is the index of the optimal action given the system is in (observed) state i. | 
| ... | additional arguments to  | 
a data frame "df" with the state, action and a value at each time step in the simulation
| 1 2 3 4 5 6 7 8 9 10 11 12 | source(system.file("examples/K_models.R", package="mdplearning"))
transition <- lapply(models, `[[`, "transition")
reward <- models[[1]]$reward
df <- mdp_compute_policy(transition, reward, discount, model_prior = c(0.5, 0.5))
out <- mdp_planning(transition[[1]], reward, discount, x0 = 10,
               Tmax = 20, policy = df$policy)
## Simulate MDP strategy under observation uncertainty
out <- mdp_planning(transition[[1]], reward, discount, x0 = 10,
               Tmax = 20, policy = df$policy, 
               observation = models[[1]]$observation)
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.