ReinforcementLearning: Performs reinforcement learning
In nproellochs/ReinforcementLearning: Model-Free Reinforcement Learning

Description Usage Arguments Value References Examples

Performs model-free reinforcement learning. Requires input data in the form of sample sequences consisting of states, actions and rewards. The result of the learning process is a state-action table and an optimal policy that defines the best possible action in each state.

ReinforcementLearning(data, s = "s", a = "a", r = "r",
  s_new = "s_new", learningRule = "experienceReplay", iter = 1,
  control = list(alpha = 0.1, gamma = 0.1, epsilon = 0.1), verbose = F,
  model = NULL, ...)

`data`	A dataframe containing the input sequences for reinforcement learning. Each row represents a state transition tuple `(s,a,r,s_new)`.
`s`	A string defining the column name of the current state in `data`.
`a`	A string defining the column name of the selected action for the current state in `data`.
`r`	A string defining the column name of the reward in the current state in `data`.
`s_new`	A string defining the column name of the next state in `data`.
`learningRule`	A string defining the selected reinforcement learning agent. The default value and only option in the current package version is `experienceReplay`.
`iter`	(optional) Iterations to be done. iter is an integer greater than 0. By default, `iter` is set to 1.
`control`	(optional) Control parameters defining the behavior of the agent. Default: `alpha = 0.1`; `gamma = 0.1`; `epsilon = 0.1`.
`verbose`	If true, progress report is shown. Default: `false`.
`model`	(optional) Existing model of class `rl`. Default: `NULL`.
`...`	Additional parameters passed to function.

An object of class rl with the following components:

Q: Resulting state-action table.
Q_hash: Resulting state-action table in hash format.
Actions: Set of actions.
States: Set of states.
Policy: Resulting policy defining the best possible action in each state.
RewardSequence: Rewards collected during each learning episode in iter.
Reward: Total reward collected during the last learning iteration in iter.

Sutton and Barto (1998). Reinforcement Learning: An Introduction, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA.

# Sampling data (1000 grid sequences)
data <- sampleGridSequence(1000)

# Setting reinforcement learning parameters
control <- list(alpha = 0.1, gamma = 0.1, epsilon = 0.1)

# Performing reinforcement learning
model <- ReinforcementLearning(data, s = "State", a = "Action", r = "Reward",
s_new = "NextState", control = control)

# Printing model
print(model)

# Plotting learning curve
plot(model)