| nmab_gi_value | R Documentation |
Assumes Sigma = mu = 0.
nmab_gi_value(lambda, n, gamma, tau, N, xi, delta, extra_xi = 1)
lambda |
Reward from the known arm |
n |
Numeric > 0. Value of n for the unknown arm |
gamma |
Numeric in (0, 1). Reward discount factor. |
tau |
Numeric > 0. Observation precision. |
N |
Integer >= 2. Time horizon used. |
xi |
Numeric > 0. Value of xi (entent of dynamic program state space). |
delta |
Numeric > 0. Value of delta (fineness of discretisation in the dynamic program). |
extra_xi |
Extend xi using a fast approximation. See details |
The extra_xi argument was a later addition to the algorithm, not included in the paper, which
improves accuracy at low computational cost.
Normally, states outside the width of the state space are ignored (taken to have a value of zero).
This saves computation for states that are unlikely to be visited. However, the calculation can
be improved with relatively little computation by giving some of these states a value using
their mean reward only (no further learning). Although this is an approximation it will always
be more accurate than using zero. So there are two blacks of states: the original states are
within xi standard deviations and are calculated in detail using dynamic programming; and the
new states within xi + extra_xi standard deviations. I have found extra_xi = 1 works well and
have set this as the default. This is value that should be used unless doing research on its effect.
Difference in value between safe and unknown arms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.