knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(magrittr)
library(rlsims)
library(ggplot2)

Introduction to K-Armed Bandits

Multi-armed Bandits, also called $K$-armed Bandits, describe a type of reinforcement learning problem whereby an agent selects one of K possible arms for any number of trials. After each selection, the agent experiences a reinforcement which influences the agent's action on the next trial. Multi-armed bandits can help formalize a number of real-world problems. Consider a developer who wants to try different application icons and find which one is most appealing to customers and maximizes the number of downloads. They might use a multi-armed bandit to learn which $k$-icon maximizes positive reinforcements (number of downloads).

Similar ideas are explained in greater detail in Alksandrs Slivkins textbook, Introduction to Multi-Armed Bandits. In the introduction, Slivkins describes the key points of bandit algorithms, highlighting the tradeoff between "exploration and exploitation: making optimal near-term decisions based on the available information." Exploration involves selecting new arms; exploitation involves selecting the arm that previously led to the best reinforcement.

This tradeoff is often formalized with a decision-making policy, describes how an agent makes a choice. In this package, we have implemented three widely-used policies: greedy, epsilon-greedy, and softmax.

Examples

Under construction!



jdtrat/rlsims documentation built on March 26, 2022, 6:17 p.m.