mdp_online: mdp online learning

Description Usage Arguments Details Value

Description

Given previous state, previous action, and current state, update the model prior and propose best action

Usage

1
2
mdp_online(transition, reward, discount, model_prior, prev_state, prev_action,
  state, ...)

Arguments

transition

list of transition matrices, one per model

reward

the utility matrix U(x,a) of being at state x and taking action a

discount

the discount factor (1 is no discounting)

model_prior

the prior belief over models, a numeric of length(transitions). Uniform by default

prev_state

the previous state of the system

prev_action

the action taken after observing the previous state

state

the most recent state observed

...

additional arguments to mdp_compute_policy

Details

mdp_online provides a real-time updating mechanism given the latest observations. To learn the best model and compare proposed actions across historical data, use mdp_historical, which loops over mdp_online. @examples source(system.file("examples/K_models.R", package="mdplearning")) transition <- lapply(models, '[[', "transition") reward <- models[[1]]$reward mdp_online(transition, reward, discount, c(0.5, 0.5), 10, 1, 12)

Value

a list, with component 'action' giving the action recommended, and posterior, a vector of length(transition) giving the updated probability over models


boettiger-lab/mdplearning documentation built on May 13, 2019, 8:23 a.m.