LifPolicy: Policy: Continuum Bandit Policy with Lock-in Feedback
In Nth-iteration-labs/contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

Description Details Usage References See Also

The continuum type Lock-in Feedback (LiF) policy is based on an approach used in physics and engineering, where, if a physical variable y depends on the value of a well controllable physical variable x, the search for argmax x f(x) can be solved via what is nowadays considered as standard electronics. This approach relies on the possibility of making the variable x oscillate at a fixed frequency and to look at the response of the dependent variable y at the very same frequency by means of a lock-in amplifier. The method is particularly suitable when y is immersed in a high noise level, where other more direct methods would fail. Furthermore, should the entire curve shift (or, in other words, if argmax x f(x) changes in time, also known as concept drift), the circuit will automatically adjust to the new situation and quickly reveal the new maximum position. This approach is widely used in a very large number of applications, both in industry and research, and is the basis for the Lock-in Feedback (LiF) method.

In this, Lock in feedback goes through the following steps, again and again:

Oscillate a controllable independent variable X around a set value at a fixed pace.
Apply the Lock-in amplifier algorithm.to obtain values of the amplitude if the outcome variable Y at the pace you set at step 1.
Is the amplitude of this variable zero? Congratulations, you have reached lock-in! That is, you have found the optimal value of Y at the current value of X. Still, this optimal value might shift over time, so move to step 1 and repeat the process to make sure we maintain lock-in.
Is the amplitude less than, or greater than zero? Then move the set value around which we are oscillating our independent variable X up or down on the basis of the outcome.

Now move to step 1 and repeat..

1	b <- LifPolicy$new(inttime,amplitude,learnrate,omega,x0_start)

Kaptein, M. C., Van Emden, R., & Iannuzzi, D. (2016). Tracking the decoy: maximizing the decoy effect through sequential experimentation. Palgrave Communications, 2, 16082.

Core contextual classes: Bandit, Policy, Simulator, Agent, History, Plot

Bandit subclass examples: BasicBernoulliBandit, ContextualLogitBandit, OfflineReplayEvaluatorBandit

Policy subclass examples: EpsilonGreedyPolicy, ContextualLinTSPolicy

Nth-iteration-labs/contextual documentation built on July 28, 2020, 1:13 p.m.

Nth-iteration-labs/contextual index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Nth-iteration-labs/contextual
Simulation and Analysis of Contextual Multi-Armed Bandit Policies

LifPolicy: Policy: Continuum Bandit Policy with Lock-in Feedback
In Nth-iteration-labs/contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

Description

Details

Usage

References

See Also

Related to LifPolicy in Nth-iteration-labs/contextual...

R Package Documentation

Browse R Packages

We want your feedback!

Nth-iteration-labs/contextual Simulation and Analysis of Contextual Multi-Armed Bandit Policies

LifPolicy: Policy: Continuum Bandit Policy with Lock-in Feedback In Nth-iteration-labs/contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies

Description

Details

Usage

References

See Also

Related to LifPolicy in Nth-iteration-labs/contextual...

R Package Documentation

Browse R Packages

We want your feedback!

Nth-iteration-labs/contextual
Simulation and Analysis of Contextual Multi-Armed Bandit Policies

LifPolicy: Policy: Continuum Bandit Policy with Lock-in Feedback
In Nth-iteration-labs/contextual: Simulation and Analysis of Contextual Multi-Armed Bandit Policies