ReinforcementLearning: Performs reinforcement learning
In ReinforcementLearning: Model-Free Reinforcement Learning

Description Usage Arguments Value References Examples

Performs model-free reinforcement learning. Requires input data in the form of sample sequences consisting of states, actions and rewards. The result of the learning process is a state-action table and an optimal policy that defines the best possible action in each state.

ReinforcementLearning(data, s = "s", a = "a", r = "r",
  s_new = "s_new", learningRule = "experienceReplay", iter = 1,
  control = list(alpha = 0.1, gamma = 0.1, epsilon = 0.1), verbose = F,
  model = NULL, ...)

`data`	A dataframe containing the input sequences for reinforcement learning. Each row represents a state transition tuple `(s,a,r,s_new)`.
`s`	A string defining the column name of the current state in `data`.
`a`	A string defining the column name of the selected action for the current state in `data`.
`r`	A string defining the column name of the reward in the current state in `data`.
`s_new`	A string defining the column name of the next state in `data`.
`learningRule`	A string defining the selected reinforcement learning agent. The default value and only option in the current package version is `experienceReplay`.
`iter`	(optional) Iterations to be done. iter is an integer greater than 0. By default, `iter` is set to 1.
`control`	(optional) Control parameters defining the behavior of the agent. Default: `alpha = 0.1`; `gamma = 0.1`; `epsilon = 0.1`.
`verbose`	If true, progress report is shown. Default: `false`.
`model`	(optional) Existing model of class `rl`. Default: `NULL`.
`...`	Additional parameters passed to function.

An object of class rl with the following components:

Q: Resulting state-action table.
Q_hash: Resulting state-action table in hash format.
Actions: Set of actions.
States: Set of states.
Policy: Resulting policy defining the best possible action in each state.
RewardSequence: Rewards collected during each learning episode in iter.
Reward: Total reward collected during the last learning iteration in iter.

Sutton and Barto (1998). Reinforcement Learning: An Introduction, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA.

# Sampling data (1000 grid sequences)
data <- sampleGridSequence(1000)

# Setting reinforcement learning parameters
control <- list(alpha = 0.1, gamma = 0.1, epsilon = 0.1)

# Performing reinforcement learning
model <- ReinforcementLearning(data, s = "State", a = "Action", r = "Reward",
s_new = "NextState", control = control)

# Printing model
print(model)

# Plotting learning curve
plot(model)

State-Action function Q
         right        up        down      left
s1 -1.09714550 -1.095733 -1.00553794 -1.098624
s2 -0.02999958 -1.097917 -1.00380243 -1.003177
s3 -0.02495463  9.842528 -0.02009063 -1.005332
s4 -1.10760309 -1.107829 -1.10758348 -1.108662

Policy
     s1      s2      s3      s4 
 "down" "right"    "up"  "down" 

Reward (last iteration)
[1] -450