knitr::opts_chunk$set(message = TRUE, eval = TRUE, collapse = TRUE, comment = "#>")

This vignette explains the different possibilities to create and use a reinforcement learning environment in reinforcelearn. Section Creation explains how to create an environment and Section Interaction describe how to use the created environment object for interaction.



The makeEnvironment function provides different ways to create an environment. It is called with the class name as a first argument. You can pass arguments of the specific environment class (e.g. the state transition array for an MDP) to the ... argument.

Create a custom environment

To create a custom environment you have to set up a step and reset function, which define the rewards the agent receives and ultimately the goal of what to learn.

Here is an example setting up a the famous Mountain Car problem.


The task of the reset function is to initialize the starting state of the environment and usually this function is called when starting a new episode. It returns the state of the environment. It takes an argument self, which is the newly created R6 class and can be used e.g. to access the current state of the environment.

reset = function(self) {
  position = runif(1, -0.6, -0.4)
  velocity = 0
  state = matrix(c(position, velocity), ncol = 2)

The step function is used for interaction, it controls the transition to the next state and reward given an action. It takes self and action as an argument and returns a list with the next state, reward and whether an episode is finished (done).

step = function(self, action) {
  position = self$state[1]
  velocity = self$state[2]
  velocity = (action - 1L) * 0.001 + cos(3 * position) * (-0.0025)
  velocity = min(max(velocity, -0.07), 0.07)
  position = position + velocity
  if (position < -1.2) {
    position = -1.2
    velocity = 0
  state = matrix(c(position, velocity), ncol = 2)
  reward = -1
  if (position >= 0.5) {
    done = TRUE
    reward = 0
  } else {
    done = FALSE
  list(state, reward, done)

Then we can create the environment with

env = makeEnvironment(step = step, reset = reset)

OpenAI Gym

OpenAI Gym [@gym_openai] provides a set of environments, which can be used for benchmarking.

To use a gym environment you have to install

Then you can create a gym environment by passing on the name of the environment.

# Create a gym environment.
env = makeEnvironment("gym", gym.name = "MountainCar-v0")

Have a look at https://gym.openai.com/envs for possible environments.

Markov Decision Process

A Markov Decision Process (MDP) is a stochastic process, which is commonly used for reinforcement learning environments. When the problem can be formulated as a MDP, all you need to pass to makeEnvironment is the state transition array $P^a_{ss'}$ and reward matrix $R_s^a$ of the MDP.

We can create a simple MDP with 2 states and 2 actions with the following code.

# State transition array
P = array(0, c(2, 2, 2))
P[, , 1] = matrix(c(0.5, 0.5, 0, 1), 2, 2, byrow = TRUE)
P[, , 2] = matrix(c(0.1, 0.9, 0, 1), 2, 2, byrow = TRUE)

# Reward matrix
R = matrix(c(5, 10, -1, 2), 2, 2, byrow = TRUE)

env = makeEnvironment("mdp", transitions = P, rewards = R)


A gridworld is a simple MDP navigation task with a discrete state and action space. The agent has to move through a grid from a start state to a goal state. Possible actions are the standard moves (left, right, up, down) or could also include the diagonal moves (leftup, leftdown, rightup, rightdown).

Here is an example of a 4x4 gridworld [@sutton2017, Example 4.1] with two terminal states in the lower right and upper left of the grid. Rewards are - 1 for every transition until reaching a terminal state.


The following code creates this gridworld.

# Gridworld Environment (Sutton & Barto (2017) Example 4.1)
env = makeEnvironment("gridworld", shape = c(4, 4), goal.states = c(0, 15))


makeEnvironment returns an R6 class object which can be used for the interaction between agent and environment.

env = makeEnvironment("gridworld", shape = c(4, 4), 
  goal.states = 0L, initial.state = 15L)

To take an action you can call the step(action) method. It is called with an action as an argument and internally computes the following state, reward and whether an episode is finished (done).

# The initial state of the environment.


# Actions are encoded as integers.


# But can also have character names.


Note that the R6 class object changes whenever calling step or reset! Therefore calling step with the same action twice will most likely return different states and rewards!

Note also that all discrete states and actions are numerated starting with 0 to be consistent with OpenAI Gym!

The environment object often also contains information about the number of states and actions or the bounds in case of a continuous space.

env = makeEnvironment("mountain.car")

It also contains a counter of the number of interactions, i.e. the number of times step has been called, the number of steps in the current episode, the number of episodes and return in the current episode.

env = makeEnvironment("gridworld", shape = c(4, 4), 
  goal.states = 0L, initial.state = 15L, discount = 0.99)



Full list of attributes and methods:

Here is a full list describing the attributes of the R6 class created by makeEnvironment.




Try the reinforcelearn package in your browser

Any scripts or data that you put into this service are public.

reinforcelearn documentation built on May 2, 2019, 9:20 a.m.