Description Usage Arguments Details Value Note See Also Examples
Gibbs()
is a utility function that samples actions from Gibbs distribution.
1 | Gibbs(values, temperature = 1)
|
values |
a numeric vector or matrix for values of actions. For numeric vector, it is values of all actions; for numeric matrix, each row is a set of values. |
temperature |
a numeric value as temperature. |
The Gibbs (or Boltzmann) distribution has p.d.f. of the following form
p(x) = C exp(x / T),
where temperature T controls the trade-off between greedy (exploitation) and uniformly random (exploration): a high temperature makes the distribution more even, while a lower temperature makes the distribution more concentrated (to favor higher values).
an integer vector for chosen action indexes
A negative T can be used to favor actions with lower values. In fact, the standard formulation of Gibbs distribution has a negative sign before x / T, thus lower values have higher chances of being sampled.
Other value-based action functions:
EpsilonGreedy()
,
Greedy()
,
Random()
1 2 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.