gibbs: Gibbs Action

Description Usage Arguments Details Value Note See Also Examples

Description

Gibbs() is a utility function that samples actions from Gibbs distribution.

Usage

1
Gibbs(values, temperature = 1)

Arguments

values

a numeric vector or matrix for values of actions. For numeric vector, it is values of all actions; for numeric matrix, each row is a set of values.

temperature

a numeric value as temperature.

Details

The Gibbs (or Boltzmann) distribution has p.d.f. of the following form

p(x) = C exp(x / T),

where temperature T controls the trade-off between greedy (exploitation) and uniformly random (exploration): a high temperature makes the distribution more even, while a lower temperature makes the distribution more concentrated (to favor higher values).

Value

an integer vector for chosen action indexes

Note

A negative T can be used to favor actions with lower values. In fact, the standard formulation of Gibbs distribution has a negative sign before x / T, thus lower values have higher chances of being sampled.

See Also

Other value-based action functions: EpsilonGreedy(), Greedy(), Random()

Examples

1
2
Gibbs(c(2, 1, 3, 7, 5))
Gibbs(matrix(c(1, 2, 2, 1), 2, 2, byrow = TRUE), temperature = 0.01)

XiaoqiLu/PhD-Thesis documentation built on March 1, 2021, 10:49 a.m.