View source: R/compute_policy.R
compute_policy | R Documentation |
Derive the corresponding policy function from the alpha vectors
compute_policy(
alpha,
transition,
observation,
reward,
state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
a_0 = 1
)
alpha |
the matrix of alpha vectors returned by |
transition |
Transition matrix, dimension n_s x n_s x n_a |
observation |
Observation matrix, dimension n_s x n_z x n_a |
reward |
reward matrix, dimension n_s x n_a |
state_prior |
initial belief state, optional, defaults to uniform over states |
a_0 |
previous action. Belief in state depends not only on observation, but on prior belief of the state and subsequent action that had been taken. |
a data frame providing the optimal policy (choice of action) and corresponding value of the action for each possible belief state
m <- fisheries_matrices()
## Takes > 5s
if(assert_has_appl()){
alpha <- sarsop(m$transition, m$observation, m$reward, 0.95, precision = 10)
compute_policy(alpha, m$transition, m$observation, m$reward)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.