nmab_gi_value: Value calculation for the one-armed bandit with Normal...
In jedwards24/gittins: Calculate Gittins Indices

nmab_gi_value

R Documentation

Value calculation for the one-armed bandit with Normal rewards

Description

Assumes Sigma = mu = 0.

Usage

nmab_gi_value(lambda, n, gamma, tau, N, xi, delta, extra_xi = 1)

Arguments

`lambda`	Reward from the known arm
`n`	Numeric > 0. Value of n for the unknown arm
`gamma`	Numeric in (0, 1). Reward discount factor.
`tau`	Numeric > 0. Observation precision.
`N`	Integer >= 2. Time horizon used.
`xi`	Numeric > 0. Value of xi (entent of dynamic program state space).
`delta`	Numeric > 0. Value of delta (fineness of discretisation in the dynamic program).
`extra_xi`	Extend xi using a fast approximation. See details

Details

The extra_xi argument was a later addition to the algorithm, not included in the paper, which improves accuracy at low computational cost.

Normally, states outside the width of the state space are ignored (taken to have a value of zero). This saves computation for states that are unlikely to be visited. However, the calculation can be improved with relatively little computation by giving some of these states a value using their mean reward only (no further learning). Although this is an approximation it will always be more accurate than using zero. So there are two blacks of states: the original states are within xi standard deviations and are calculated in detail using dynamic programming; and the new states within xi + extra_xi standard deviations. I have found extra_xi = 1 works well and have set this as the default. This is value that should be used unless doing research on its effect.