Description Usage Arguments Value Examples
Simulate learning under the mdp policy
1 2 3 | mdp_learning(transition, reward, discount, model_prior = NULL, x0,
Tmax = 20, true_transition, observation = NULL, a0 = 1,
model_names = NA, ...)
|
transition |
list of transition matrices, one per model |
reward |
the utility matrix U(x,a) of being at state x and taking action a |
discount |
the discount factor (1 is no discounting) |
model_prior |
the prior belief over models, a numeric of length(transitions). Uniform by default |
x0 |
initial state |
Tmax |
termination time for finite time calculation, ignored otherwise |
true_transition |
actual transition used to drive simulation. |
observation |
NULL by default, simulate perfect observations |
a0 |
previous action before starting, irrelivant unless actions influence observations and true_observation is not null |
model_names |
optional vector of names for columns in model posterior distribution. Will be taken from names of transition list if none are provided here. |
... |
additional arguments to |
a list, containing: data frame "df" with the state, action and a value at each time step in the simulation, and a data.frame "posterior", in which the t'th row shows the belief state at time t.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | source(system.file("examples/K_models.R", package="mdplearning"))
transition <- lapply(models, `[[`, "transition")
reward <- models[[1]]$reward
## example where true model is model 1
out <- mdp_learning(transition, reward, discount, x0 = 10,
Tmax = 20, true_transition = transition[[1]])
## Did we learn which one was the true model?
out$posterior[20,]
## Simulate MDP strategy under observation uncertainty
out <- mdp_learning(transition = transition, reward, discount, x0 = 10,
true_transition = transition[[1]],
Tmax = 20, observation = models[[1]]$observation)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.