Just uses the observed reward history to fit a single learning rate representing the extent to which the agent discounts the information from the reinforcement history.
The goal of this model is to provide a benchmark
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.