mdp_online: mdp online learning
In cboettig/mdplearning: Bayesian Learning Algorithms for Markov Decision Processes

Description Usage Arguments Details Value

Given previous state, previous action, and current state, update the model prior and propose best action

1 2	mdp_online(transition, reward, discount, model_prior, prev_state, prev_action, state, ...)

`transition`	list of transition matrices, one per model
`reward`	the utility matrix U(x,a) of being at state x and taking action a
`discount`	the discount factor (1 is no discounting)
`model_prior`	the prior belief over models, a numeric of length(transitions). Uniform by default
`prev_state`	the previous state of the system
`prev_action`	the action taken after observing the previous state
`state`	the most recent state observed
`...`	additional arguments to `mdp_compute_policy`

mdp_online provides a real-time updating mechanism given the latest observations. To learn the best model and compare proposed actions across historical data, use mdp_historical, which loops over mdp_online. @examples source(system.file("examples/K_models.R", package="mdplearning")) transition <- lapply(models, '[[', "transition") reward <- models[[1]]$reward mdp_online(transition, reward, discount, c(0.5, 0.5), 10, 1, 12)

a list, with component 'action' giving the action recommended, and posterior, a vector of length(transition) giving the updated probability over models

cboettig/mdplearning documentation built on May 13, 2019, 2:08 p.m.