regret | R Documentation |
Calculates the regret of a policy relative to a benchmark policy.
regret(policy, benchmark, start = NULL)
policy |
a solved POMDP containing the policy to calculate the regret for. |
benchmark |
a solved POMDP with the (optimal) policy. Regret is calculated relative to this policy. |
start |
the used start (belief) state. If NULL then the start (belief) state of the |
Regret is defined as V^{\pi^*}(s_0) - V^{\pi}(s_0)
with V^\pi
representing the expected long-term
state value (represented by the value function) given the policy \pi
and the start
state s_0
. For POMDPs the start state is the start belief b_0
.
Note that for regret usually the optimal policy \pi^*
is used as the benchmark.
Since the optimal policy may not be known, regret relative to the best known policy can be used.
the regret as a difference of expected long-term rewards.
Michael Hahsler
Other POMDP:
MDP2POMDP
,
POMDP()
,
accessors
,
actions()
,
add_policy()
,
plot_belief_space()
,
projection()
,
reachable_and_absorbing
,
sample_belief_space()
,
simulate_POMDP()
,
solve_POMDP()
,
solve_SARSOP()
,
transition_graph()
,
update_belief()
,
value_function()
,
write_POMDP()
Other MDP:
MDP()
,
MDP2POMDP
,
MDP_policy_functions
,
accessors
,
actions()
,
add_policy()
,
gridworld
,
reachable_and_absorbing
,
simulate_MDP()
,
solve_MDP()
,
transition_graph()
,
value_function()
data(Tiger)
sol_optimal <- solve_POMDP(Tiger)
sol_optimal
# perform exact value iteration for 10 epochs
sol_quick <- solve_POMDP(Tiger, method = "enum", horizon = 10)
sol_quick
regret(sol_quick, benchmark = sol_optimal)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.