sim_pomdp: simulate a POMDP
In sarsop: Approximate POMDP Planning Software

View source: R/sim_pomdp.R

sim_pomdp

R Documentation

simulate a POMDP

Description

Simulate a POMDP given the appropriate matrices.

Usage

sim_pomdp(
  transition,
  observation,
  reward,
  discount,
  state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
  x0,
  a0 = 1,
  Tmax = 20,
  policy = NULL,
  alpha = NULL,
  reps = 1,
  ...
)

Arguments

`transition`	Transition matrix, dimension n_s x n_s x n_a
`observation`	Observation matrix, dimension n_s x n_z x n_a
`reward`	reward matrix, dimension n_s x n_a
`discount`	the discount factor
`state_prior`	initial belief state, optional, defaults to uniform over states
`x0`	initial state
`a0`	initial action (default is action 1, e.g. can be arbitrary if the observation process is independent of the action taken)
`Tmax`	duration of simulation
`policy`	Simulate using a pre-computed policy (e.g. MDP policy) instead of POMDP
`alpha`	the matrix of alpha vectors returned by `sarsop`
`reps`	number of replicate simulations to compute
`...`	additional arguments to mclapply

Details

simulation assumes the following order of updating: For system in state[t] at time t, an observation of the system obs[t] is made, and then action[t] is based on that observation and the given policy, returning (discounted) reward[t].

Value

a data frame with columns for time, state, obs, action, and (discounted) value.

Examples

m <- fisheries_matrices()
discount <- 0.95
 ## Takes > 5s
if(assert_has_appl()){
alpha <- sarsop(m$transition, m$observation, m$reward, discount, precision = 10)
sim <- sim_pomdp(m$transition, m$observation, m$reward, discount,
                 x0 = 5, Tmax = 20, alpha = alpha)

}

sarsop documentation built on April 16, 2025, 9:08 a.m.