Description Usage Arguments Value Examples
compute mdp policy
1 2 3 | mdp_compute_policy(transition, reward, discount, model_prior = NULL,
max_iter = 500, epsilon = 1e-05, Tmax = max_iter,
type = c("value iteration", "policy iteration", "finite time"))
|
transition |
list of transition matrices, one per model |
reward |
the utility matrix U(x,a) of being at state x and taking action a |
discount |
the discount factor (1 is no discounting) |
model_prior |
the prior belief over models, a numeric of length(transitions). Uniform by default |
max_iter |
maximum number of iterations to perform |
epsilon |
convergence tolerance |
Tmax |
termination time for finite time calculation, ignored otherwise |
type |
consider converged when policy converges or when value converges? |
a data.frame with the optimal policy and (discounted) value associated with each state
1 2 3 4 5 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.