simModel: Perform One Model-Based Trial
In jdtrat/dynaq: Tools to Simulate DynaQ Reinforcement Learning Algorithms

The simModel function simulates one trial of the two-stage Markov task using a model-based approach (with a transition model), whose action values get updated according to the Q-learning. The model-based simulations are based on random previously visited states and previously taken actions. The x parameter is the number of simulations that are run, in line with the Dyna architecture.

1	simModel(trialData, modelAlpha = 0.1, gam = 0.9, epsilon = 0.1, tau = 0.08, x)

`trialData`	The output of `oneTrial` (i.e. the last real experience)
`modelAlpha`	The learning rate from simulated data.
`gam`	The temporal discounting factor, gamma.
`epsilon`	The epsilon to be used in epsilon-greedy policy choices.
`tau`	The tau (temperature) to be used in softmax policy choices.
`x`	The amount of simulations to be done. This is used to track the total number performed via the `updateTransFunction`.

A tibble with 8 rows and 18 columns. The 8 rows contain identical information except for the Qtable column. They contain information about the states, actions, and rewards for one trial as well as meta data including the temporal discounting factor (gamma) learning rate (alpha, specific to simulated experience), choice policy parameters (epsilon and tau), and probability of receiving a reward for each image.

jdtrat/dynaq documentation built on July 24, 2020, 7:18 a.m.