ReinforcementLearning: Performs reinforcement learning

Description Usage Arguments Value References Examples

View source: R/ReinforcementLearning.R

Description

Performs model-free reinforcement learning. Requires input data in the form of sample sequences consisting of states, actions and rewards. The result of the learning process is a state-action table and an optimal policy that defines the best possible action in each state.

Usage

1
2
3
4
ReinforcementLearning(data, s = "s", a = "a", r = "r",
  s_new = "s_new", learningRule = "experienceReplay", iter = 1,
  control = list(alpha = 0.1, gamma = 0.1, epsilon = 0.1), verbose = F,
  model = NULL, ...)

Arguments

data

A dataframe containing the input sequences for reinforcement learning. Each row represents a state transition tuple (s,a,r,s_new).

s

A string defining the column name of the current state in data.

a

A string defining the column name of the selected action for the current state in data.

r

A string defining the column name of the reward in the current state in data.

s_new

A string defining the column name of the next state in data.

learningRule

A string defining the selected reinforcement learning agent. The default value and only option in the current package version is experienceReplay.

iter

(optional) Iterations to be done. iter is an integer greater than 0. By default, iter is set to 1.

control

(optional) Control parameters defining the behavior of the agent. Default: alpha = 0.1; gamma = 0.1; epsilon = 0.1.

verbose

If true, progress report is shown. Default: false.

model

(optional) Existing model of class rl. Default: NULL.

...

Additional parameters passed to function.

Value

An object of class rl with the following components:

Q

Resulting state-action table.

Q_hash

Resulting state-action table in hash format.

Actions

Set of actions.

States

Set of states.

Policy

Resulting policy defining the best possible action in each state.

RewardSequence

Rewards collected during each learning episode in iter.

Reward

Total reward collected during the last learning iteration in iter.

References

Sutton and Barto (1998). Reinforcement Learning: An Introduction, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Sampling data (1000 grid sequences)
data <- sampleGridSequence(1000)

# Setting reinforcement learning parameters
control <- list(alpha = 0.1, gamma = 0.1, epsilon = 0.1)

# Performing reinforcement learning
model <- ReinforcementLearning(data, s = "State", a = "Action", r = "Reward",
s_new = "NextState", control = control)

# Printing model
print(model)

# Plotting learning curve
plot(model)

Example output

State-Action function Q
         right        up        down      left
s1 -1.09714550 -1.095733 -1.00553794 -1.098624
s2 -0.02999958 -1.097917 -1.00380243 -1.003177
s3 -0.02495463  9.842528 -0.02009063 -1.005332
s4 -1.10760309 -1.107829 -1.10758348 -1.108662

Policy
     s1      s2      s3      s4 
 "down" "right"    "up"  "down" 

Reward (last iteration)
[1] -450

ReinforcementLearning documentation built on March 26, 2020, 7:38 p.m.