regret: Calculate the Regret of a Policy
In pomdp: Infrastructure for Partially Observable Markov Decision Processes (POMDP)

regret

R Documentation

Calculate the Regret of a Policy

Description

Calculates the regret of a policy relative to a benchmark policy.

Usage

regret(policy, benchmark, start = NULL)

Arguments

`policy`	a solved POMDP containing the policy to calculate the regret for.
`benchmark`	a solved POMDP with the (optimal) policy. Regret is calculated relative to this policy.
`start`	the used start (belief) state. If NULL then the start (belief) state of the `benchmark` is used.

Details

Regret is defined as V^{\pi^*}(s_0) - V^{\pi}(s_0) with V^\pi representing the expected long-term state value (represented by the value function) given the policy \pi and the start state s_0. For POMDPs the start state is the start belief b_0.

Note that for regret usually the optimal policy \pi^* is used as the benchmark. Since the optimal policy may not be known, regret relative to the best known policy can be used.

Value

the regret as a difference of expected long-term rewards.

Author(s)

Michael Hahsler

Examples

data(Tiger)

sol_optimal <- solve_POMDP(Tiger)
sol_optimal

# perform exact value iteration for 10 epochs
sol_quick <- solve_POMDP(Tiger, method = "enum", horizon = 10)
sol_quick

regret(sol_quick, benchmark = sol_optimal)

pomdp documentation built on April 3, 2025, 10:58 p.m.