nmab_gi_value: Value calculation for the one-armed bandit with Normal...

View source: R/nmab.R

nmab_gi_valueR Documentation

Value calculation for the one-armed bandit with Normal rewards

Description

Assumes Sigma = mu = 0.

Usage

nmab_gi_value(lambda, n, gamma, tau, N, xi, delta, extra_xi = 1)

Arguments

lambda

Reward from the known arm

n

Numeric > 0. Value of n for the unknown arm

gamma

Numeric in (0, 1). Reward discount factor.

tau

Numeric > 0. Observation precision.

N

Integer >= 2. Time horizon used.

xi

Numeric > 0. Value of xi (entent of dynamic program state space).

delta

Numeric > 0. Value of delta (fineness of discretisation in the dynamic program).

extra_xi

Extend xi using a fast approximation. See details

Details

The extra_xi argument was a later addition to the algorithm, not included in the paper, which improves accuracy at low computational cost.

Normally, states outside the width of the state space are ignored (taken to have a value of zero). This saves computation for states that are unlikely to be visited. However, the calculation can be improved with relatively little computation by giving some of these states a value using their mean reward only (no further learning). Although this is an approximation it will always be more accurate than using zero. So there are two blacks of states: the original states are within xi standard deviations and are calculated in detail using dynamic programming; and the new states within xi + extra_xi standard deviations. I have found extra_xi = 1 works well and have set this as the default. This is value that should be used unless doing research on its effect.

Value

Difference in value between safe and unknown arms.


jedwards24/gittins documentation built on Oct. 13, 2023, 4:17 p.m.