simulate_POMDP: Simulate Trajectories in a POMDP

View source: R/simulate_POMDP.R

simulate_POMDPR Documentation

Simulate Trajectories in a POMDP

Description

Simulate trajectories through a POMDP. The start state for each trajectory is randomly chosen using the specified belief. The belief is used to choose actions from the the epsilon-greedy policy and then updated using observations.

Usage

simulate_POMDP(
  model,
  n = 100,
  belief = NULL,
  horizon = NULL,
  return_beliefs = FALSE,
  epsilon = NULL,
  digits = 7,
  engine = "cpp",
  verbose = FALSE,
  ...
)

Arguments

model

a POMDP model.

n

number of trajectories.

belief

probability distribution over the states for choosing the starting states for the trajectories. Defaults to the start belief state specified in the model or "uniform".

horizon

number of epochs for the simulation. If NULL then the horizon for the model is used.

return_beliefs

logical; Return all visited belief states? This requires n x horizon memory.

epsilon

the probability of random actions for using an epsilon-greedy policy. Default for solved models is 0 and for unsolved model 1.

digits

round probabilities for belief points.

engine

'cpp', 'r' to perform simulation using a faster C++ or a native R implementation that supports sparse matrices and multi-episode problems.

verbose

report used parameters.

...

further arguments are ignored.

Details

A native R implementation is available (engine = 'r') and a faster C++ implementation (engine = 'cpp').

Both implementations support parallel execution using the package foreach. To enable parallel execution, a parallel backend like doparallel needs to be available needs to be registered (see doParallel::registerDoParallel()). Note that small simulations are slower using parallelization. Therefore, C++ simulations with n * horizon less than 100,000 are always executed using a single worker.

Value

A list with elements:

  • avg_reward: The average discounted reward.

  • belief_states: A matrix with belief states as rows.

  • action_cnt: Action counts.

  • state_cnt: State counts.

  • reward: Reward for each trajectory.

Author(s)

Michael Hahsler

See Also

Other POMDP: POMDP_accessors, POMDP(), plot_belief_space(), projection(), regret(), sample_belief_space(), solve_POMDP(), solve_SARSOP(), transition_graph(), update_belief(), value_function(), write_POMDP()

Examples

data(Tiger)

# solve the POMDP for 5 epochs and no discounting
sol <- solve_POMDP(Tiger, horizon = 5, discount = 1, method = "enum")
sol
policy(sol)

# uncomment the following line to register a parallel backend for simulation 
# (needs package doparallel installed)

# doParallel::registerDoParallel()

## Example 1: simulate 10 trajectories
sim <- simulate_POMDP(sol, n = 100, verbose = TRUE)
sim

# calculate the percentage that each action is used in the simulation
round_stochastic(sim$action_cnt / sum(sim$action_cnt), 2)

# reward distribution
hist(sim$reward)

## Example 2: look at all belief states in the trajectory starting with an initial start belief.
sim <- simulate_POMDP(sol, n = 100, belief = c(.5, .5), return_beliefs = TRUE)
head(sim$belief_states)

# plot with added density (the x-axis is the probability of the second belief state)
plot_belief_space(sol, sample = sim$belief_states, jitter = 2, ylim = c(0, 6))
lines(density(sim$belief_states[, 2], bw = .02)); axis(2); title(ylab = "Density")


## Example 3: simulate trajectories for an unsolved POMDP which uses an epsilon of 1
#             (i.e., all actions are randomized)
sim <- simulate_POMDP(Tiger, n = 100, horizon = 5, return_beliefs = TRUE, verbose = TRUE)
sim$avg_reward

plot_belief_space(sol, sample = sim$belief_states, jitter = 2, ylim = c(0, 6))
lines(density(sim$belief_states[, 1], bw = .05)); axis(2); title(ylab = "Density")

pomdp documentation built on Sept. 9, 2023, 1:07 a.m.