oneTrial: Perform One Model-Free Trial
In jdtrat/dynaq: Tools to Simulate DynaQ Reinforcement Learning Algorithms

The oneTrial function simulates one trial of the two-stage Markov task using model-free Q-learning.

1	oneTrial(alpha = 0.1, gam = 0.9, epsilon = 0.1, tau = 0.08, softmax = TRUE)

`alpha`	The learning rate alpha.
`gam`	The temporal discounting factor gamma.
`epsilon`	The epsilon to be used in epsilon-greedy policy choices.
`tau`	The tau (temperature) to be used in softmax policy choices.
`softmax`	Logical: TRUE if softmax policy decisions should be used; FALSE if epsilon-greedy policy decisions should be used. By default, softmax is used.

A tibble with 8 rows and 18 columns. The 8 rows contain identical information except for the Qtable column. They contain information about the states, actions, and rewards for one trial as well as meta data including the temporal discounting factor (gamma) learning rate (alpha), choice policy parameters (epsilon and tau), and probability of receiving a reward for each image.

jdtrat/dynaq documentation built on July 24, 2020, 7:18 a.m.