sim_pomdp | R Documentation |
Simulate a POMDP given the appropriate matrices.
sim_pomdp( transition, observation, reward, discount, state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]], x0, a0 = 1, Tmax = 20, policy = NULL, alpha = NULL, reps = 1, ... )
transition |
Transition matrix, dimension n_s x n_s x n_a |
observation |
Observation matrix, dimension n_s x n_z x n_a |
reward |
reward matrix, dimension n_s x n_a |
discount |
the discount factor |
state_prior |
initial belief state, optional, defaults to uniform over states |
x0 |
initial state |
a0 |
initial action (default is action 1, e.g. can be arbitrary if the observation process is independent of the action taken) |
Tmax |
duration of simulation |
policy |
Simulate using a pre-computed policy (e.g. MDP policy) instead of POMDP |
alpha |
the matrix of alpha vectors returned by |
reps |
number of replicate simulations to compute |
... |
additional arguments to mclapply |
simulation assumes the following order of updating: For system in state[t] at time t, an observation of the system obs[t] is made, and then action[t] is based on that observation and the given policy, returning (discounted) reward[t].
a data frame with columns for time, state, obs, action, and (discounted) value.
m <- fisheries_matrices() discount <- 0.95 ## Takes > 5s if(assert_has_appl()){ alpha <- sarsop(m$transition, m$observation, m$reward, discount, precision = 10) sim <- sim_pomdp(m$transition, m$observation, m$reward, discount, x0 = 5, Tmax = 20, alpha = alpha) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.