regret | R Documentation |
Calculates the regret of a policy relative to a benchmark policy.
regret(policy, benchmark, belief = NULL)
policy |
a POMDP containing the policy to calculate the regret for. |
benchmark |
a solved POMDP with the (optimal) policy. Regret is calculated relative to this policy. |
belief |
the used start belief. If NULL then the start belief of the |
Calculates the regret defined as J^{\pi^*}(b_0) - J^{\pi}(b_0)
with J^\pi
representing the expected long-term
reward given the policy \pi
and the start belief b_0
. Note that for regret usually the optimal policy \pi^*
is used as the benchmark.
Since the optimal policy may not be known, regret relative to the best known policy can be used.
the regret as a difference of expected long-term rewards.
Michael Hahsler
Other POMDP:
POMDP_accessors
,
POMDP()
,
plot_belief_space()
,
projection()
,
sample_belief_space()
,
simulate_POMDP()
,
solve_POMDP()
,
solve_SARSOP()
,
transition_graph()
,
update_belief()
,
value_function()
,
write_POMDP()
data(Tiger)
sol_optimal <- solve_POMDP(Tiger)
sol_optimal
# perform exact value iteration for 10 epochs
sol_quick <- solve_POMDP(Tiger, method = "enum", horizon = 10)
sol_quick
regret(sol_quick, sol_optimal)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.