# gibbs: Gibbs Action In XiaoqiLu/PhD-Thesis: Regularized Q-Learning

## Description

`Gibbs()` is a utility function that samples actions from Gibbs distribution.

## Usage

 `1` ```Gibbs(values, temperature = 1) ```

## Arguments

 `values` a numeric vector or matrix for values of actions. For numeric vector, it is values of all actions; for numeric matrix, each row is a set of values. `temperature` a numeric value as temperature.

## Details

The Gibbs (or Boltzmann) distribution has p.d.f. of the following form

p(x) = C exp(x / T),

where temperature T controls the trade-off between greedy (exploitation) and uniformly random (exploration): a high temperature makes the distribution more even, while a lower temperature makes the distribution more concentrated (to favor higher values).

## Value

an integer vector for chosen action indexes

## Note

A negative T can be used to favor actions with lower values. In fact, the standard formulation of Gibbs distribution has a negative sign before x / T, thus lower values have higher chances of being sampled.

Other value-based action functions: `EpsilonGreedy()`, `Greedy()`, `Random()`
 ```1 2``` ```Gibbs(c(2, 1, 3, 7, 5)) Gibbs(matrix(c(1, 2, 2, 1), 2, 2, byrow = TRUE), temperature = 0.01) ```