Description Usage Arguments Details Value Examples
Compute the expected net present (e.g. discounted) value of a (not-necessarily optimal) policy in a perfectly observed (MDP) system
1 2 | mdp_value_of_policy(policy, transition, reward, discount, model_prior = NULL,
max_iter = 500, epsilon = 1e-05)
|
policy |
the policy for which we want to determine the expected value |
transition |
list of transition matrices, one per model |
reward |
the utility matrix U(x,a) of being at state x and taking action a |
discount |
the discount factor (1 is no discounting) |
model_prior |
the prior belief over models, a numeric of length(transitions). Uniform by default |
max_iter |
maximum number of iterations to perform |
epsilon |
convergence tolerance |
transition a list of transition matrices
the expected net present value of the given policy, for each state
1 2 3 4 5 | source(system.file("examples/K_models.R", package="mdplearning"))
transition <- lapply(models, `[[`, "transition")
reward <- models[[1]][["reward"]]
df <- mdp_compute_policy(transition, reward, discount)
v <- mdp_value_of_policy(df$policy, transition, reward, discount)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.