reward: Calculate the Reward for a POMDP Solution
In pomdp: Infrastructure for Partially Observable Markov Decision Processes (POMDP)

reward

R Documentation

Calculate the Reward for a POMDP Solution

Description

This function calculates the expected total reward for a POMDP solution given a starting belief state. The value is calculated using the value function stored in the POMDP solution. In addition, the policy graph node that represents the belief state and the optimal action can also be returned using reward_node_action().

Usage

reward(x, belief = NULL, epoch = 1, ...)

reward_node_action(x, belief = NULL, epoch = 1, ...)

Arguments

`x`	a solved POMDP object.
`belief`	specification of the current belief state (see argument start in POMDP for details). By default the belief state defined in the model as start is used. Multiple belief states can be specified as rows in a matrix.
`epoch`	return reward for this epoch. Use 1 for converged policies.
`...`	further arguments are passed on.

Details

The reward is typically calculated using the value function (alpha vectors) of the solution. If these are not available, then simulate_POMDP() is used instead with a warning.

Value

reward() returns a vector of reward values, one for each belief if a matrix is specified.

reward_node_action() returns a list with the components

`belief_state`	the belief state specified in `belief`.
`reward`	the total expected reward given a belief and epoch.
`pg_node`	the policy node that represents the belief state.
`action`	the optimal action.

Author(s)

Michael Hahsler

Examples

data("Tiger")
sol <- solve_POMDP(model = Tiger)

# if no start is specified, a uniform belief is used.
reward(sol)

# we have additional information that makes us believe that the tiger
# is more likely to the left.
reward(sol, belief = c(0.85, 0.15))

# we start with strong evidence that the tiger is to the left.
reward(sol, belief = "tiger-left")

# Note that in this case, the total discounted expected reward is greater
# than 10 since the tiger problem resets and another game staring with
# a uniform belief is played which produces additional reward.

# return reward, the initial node in the policy graph and the optimal action for
# two beliefs.
reward_node_action(sol, belief = rbind(c(.5, .5), c(.9, .1)))

# manually combining reward with belief space sampling to show the value function
# (color signifies the optimal action)
samp <- sample_belief_space(sol, n = 200)
rew <- reward_node_action(sol, belief = samp)
plot(rew$belief[,"tiger-right"], rew$reward, col = rew$action, ylim = c(0, 15))
legend(x = "top", legend = levels(rew$action), title = "action", col = 1:3, pch = 1)

# this is the piecewise linear value function from the solution
plot_value_function(sol, ylim = c(0, 10))

pomdp documentation built on April 3, 2025, 10:58 p.m.