knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(magrittr)
library(rlsims) library(ggplot2)
Multi-armed Bandits, also called $K$-armed Bandits, describe a type of reinforcement learning problem whereby an agent selects one of K possible arms for any number of trials. After each selection, the agent experiences a reinforcement which influences the agent's action on the next trial. Multi-armed bandits can help formalize a number of real-world problems. Consider a developer who wants to try different application icons and find which one is most appealing to customers and maximizes the number of downloads. They might use a multi-armed bandit to learn which $k$-icon maximizes positive reinforcements (number of downloads).
Similar ideas are explained in greater detail in Alksandrs Slivkins textbook, Introduction to Multi-Armed Bandits. In the introduction, Slivkins describes the key points of bandit algorithms, highlighting the tradeoff between "exploration and exploitation: making optimal near-term decisions based on the available information." Exploration involves selecting new arms; exploitation involves selecting the arm that previously led to the best reinforcement.
This tradeoff is often formalized with a decision-making policy, describes how an agent makes a choice. In this package, we have implemented three widely-used policies: greedy, epsilon-greedy, and softmax.
epsilon
, which describes the propensity to explore the action (arm) space. The higher the epsilon, the more likely the agent is to select a random action; the lower epsilon, the more likely the agent is to select the exploitative action (one with highest expected value).tau
, which describes the propensity to explore the action space. The higher the tau (temperature), the more random the actions; the lower the tau (temperature), the more exploitative the actions (e.g., lower temperature increases the probability of taking the action with the highest expected value).Under construction!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.