Description Usage Arguments Value Examples
Simulate MDP under a given policy
1 2 | mdp_planning(transition, reward, discount, model_prior = NULL, x0,
Tmax = 20, observation = NULL, a0 = 1, policy, ...)
|
transition |
list of transition matrices, one per model |
reward |
the utility matrix U(x,a) of being at state x and taking action a |
discount |
the discount factor (1 is no discounting) |
model_prior |
the prior belief over models, a numeric of length(transitions). Uniform by default |
x0 |
initial state |
Tmax |
length of time to simulate |
observation |
NULL by default, simulate perfect observations |
a0 |
previous action before starting, irrelivant unless actions influence observations and true_observation is not null |
policy |
a vector of length n_obs, whose i'th entry is the index of the optimal action given the system is in (observed) state i. |
... |
additional arguments to |
a data frame "df" with the state, action and a value at each time step in the simulation
1 2 3 4 5 6 7 8 9 10 11 12 | source(system.file("examples/K_models.R", package="mdplearning"))
transition <- lapply(models, `[[`, "transition")
reward <- models[[1]]$reward
df <- mdp_compute_policy(transition, reward, discount, model_prior = c(0.5, 0.5))
out <- mdp_planning(transition[[1]], reward, discount, x0 = 10,
Tmax = 20, policy = df$policy)
## Simulate MDP strategy under observation uncertainty
out <- mdp_planning(transition[[1]], reward, discount, x0 = 10,
Tmax = 20, policy = df$policy,
observation = models[[1]]$observation)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.