gibbs: Gibbs Action
In XiaoqiLu/PhD-Thesis: Regularized Q-Learning

Description Usage Arguments Details Value Note See Also Examples

Gibbs() is a utility function that samples actions from Gibbs distribution.

1	Gibbs(values, temperature = 1)

`values`	a numeric vector or matrix for values of actions. For numeric vector, it is values of all actions; for numeric matrix, each row is a set of values.
`temperature`	a numeric value as temperature.

The Gibbs (or Boltzmann) distribution has p.d.f. of the following form

p(x) = C exp(x / T),

where temperature T controls the trade-off between greedy (exploitation) and uniformly random (exploration): a high temperature makes the distribution more even, while a lower temperature makes the distribution more concentrated (to favor higher values).

an integer vector for chosen action indexes

A negative T can be used to favor actions with lower values. In fact, the standard formulation of Gibbs distribution has a negative sign before x / T, thus lower values have higher chances of being sampled.

Other value-based action functions: EpsilonGreedy(), Greedy(), Random()