Description Usage Arguments Details Value
Given previous state, previous action, and current state, update the model prior and propose best action
1 2 | mdp_online(transition, reward, discount, model_prior, prev_state, prev_action,
state, ...)
|
transition |
list of transition matrices, one per model |
reward |
the utility matrix U(x,a) of being at state x and taking action a |
discount |
the discount factor (1 is no discounting) |
model_prior |
the prior belief over models, a numeric of length(transitions). Uniform by default |
prev_state |
the previous state of the system |
prev_action |
the action taken after observing the previous state |
state |
the most recent state observed |
... |
additional arguments to |
mdp_online provides a real-time updating mechanism given the latest observations. To learn the best model and compare proposed actions across historical data, use mdp_historical, which loops over mdp_online. @examples source(system.file("examples/K_models.R", package="mdplearning")) transition <- lapply(models, '[[', "transition") reward <- models[[1]]$reward mdp_online(transition, reward, discount, c(0.5, 0.5), 10, 1, 12)
a list, with component 'action' giving the action recommended, and posterior, a vector of length(transition) giving the updated probability over models
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.