regret: Calculate the Regret of a Policy

View source: R/regret.R

regretR Documentation

Calculate the Regret of a Policy

Description

Calculates the regret of a policy relative to a benchmark policy.

Usage

regret(policy, benchmark, belief = NULL)

Arguments

policy

a POMDP containing the policy to calculate the regret for.

benchmark

a solved POMDP with the (optimal) policy. Regret is calculated relative to this policy.

belief

the used start belief. If NULL then the start belief of the benchmark is used.

Details

Calculates the regret defined as J^{\pi^*}(b_0) - J^{\pi}(b_0) with J^\pi representing the expected long-term reward given the policy \pi and the start belief b_0. Note that for regret usually the optimal policy \pi^* is used as the benchmark. Since the optimal policy may not be known, regret relative to the best known policy can be used.

Value

the regret as a difference of expected long-term rewards.

Author(s)

Michael Hahsler

See Also

Other POMDP: POMDP_accessors, POMDP(), plot_belief_space(), projection(), sample_belief_space(), simulate_POMDP(), solve_POMDP(), solve_SARSOP(), transition_graph(), update_belief(), value_function(), write_POMDP()

Examples

data(Tiger)

sol_optimal <- solve_POMDP(Tiger)
sol_optimal

# perform exact value iteration for 10 epochs
sol_quick <- solve_POMDP(Tiger, method = "enum", horizon = 10)
sol_quick

regret(sol_quick, sol_optimal)

pomdp documentation built on Sept. 9, 2023, 1:07 a.m.