The Confidence Bound Target (CBT) algorithm is designed for infinite arms bandit problem. It is shown that CBT algorithm achieves the regret lower bound for general reward distributions. Reference: Hock Peng Chan and Shouri Hu (2018) <arXiv:1805.11793>.
|Author||Hock Peng Chan and Shouri Hu|
|Maintainer||Shouri Hu <e0054[email protected]>|
|Package repository||View on CRAN|
Install the latest version of this package by entering the following in R:
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.