reward: Calculate the Reward for a POMDP Solution

View source: R/reward.R

rewardR Documentation

Calculate the Reward for a POMDP Solution

Description

This function calculates the expected total reward for a POMDP solution given a starting belief state. The value is calculated using the value function stored in the POMDP solution. In addition, the policy graph node that represents the belief state and the optimal action can also be returned using reward_node_action().

Usage

reward(x, belief = NULL, epoch = 1)

reward_node_action(x, belief = NULL, epoch = 1)

Arguments

x

a solved POMDP object.

belief

specification of the current belief state (see argument start in POMDP for details). By default the belief state defined in the model as start is used. Multiple belief states can be specified as rows in a matrix.

epoch

return reward for this epoch. Use 1 for converged policies.

Value

reward() returns a vector of reward values, one for each belief if a matrix is specified.

reward_node_action() returns a list with the components

belief_state

the belief state specified in belief.

reward

the total expected reward given a belief and epoch.

pg_node

the policy node that represents the belief state.

action

the optimal action.

Author(s)

Michael Hahsler

See Also

Other policy: estimate_belief_for_nodes(), optimal_action(), plot_belief_space(), plot_policy_graph(), policy_graph(), policy(), projection(), solve_POMDP(), solve_SARSOP(), value_function()

Examples

data("Tiger")
sol <- solve_POMDP(model = Tiger)

# if no start is specified, a uniform belief is used.
reward(sol)

# we have additional information that makes us believe that the tiger
# is more likely to the left.
reward(sol, belief = c(0.85, 0.15))

# we start with strong evidence that the tiger is to the left.
reward(sol, belief = "tiger-left")

# Note that in this case, the total discounted expected reward is greater
# than 10 since the tiger problem resets and another game staring with
# a uniform belief is played which produces additional reward.

# return reward, the initial node in the policy graph and the optimal action for
# two beliefs. 
reward_node_action(sol, belief = rbind(c(.5, .5), c(.9, .1)))

# manually combining reward with belief space sampling to show the value function
# (color signifies the optimal action)
samp <- sample_belief_space(sol, n = 200)
rew <- reward_node_action(sol, belief = samp)
plot(rew$belief[,"tiger-right"], rew$reward, col = rew$action, ylim = c(0, 15))
legend(x = "top", legend = levels(rew$action), title = "action", col = 1:3, pch = 1)

# this is the piecewise linear value function from the solution
plot_value_function(sol, ylim = c(0, 10))

pomdp documentation built on Sept. 9, 2023, 1:07 a.m.