# Gridworld: Gridworld In reinforcelearn: Reinforcement Learning

## Description

Creates gridworld environments.

## Arguments

 `shape` [`integer(2)`] Shape of the gridworld (number of rows x number of columns). `goal.states` [`integer`] Goal states in the gridworld. `cliff.states` [`integer`] Cliff states in the gridworld. `reward.step` [`integer(1)`] Reward for taking a step. `cliff.transition.states` [`integer`] States to which the environment transitions if stepping into the cliff. If it is a vector, all states will have equal probability. Only used when `cliff.transition.done == FALSE`, else specify the `initial.state` argument. `reward.cliff` [`integer(1)`] Reward for taking a step in the cliff state. `diagonal.moves` [`logical(1)`] Should diagonal moves be allowed? `wind` [`integer`] Strength of the upward wind in each cell. `cliff.transition.done` [`logical(1)`] Should the episode end after stepping into the cliff? `stochasticity` [`numeric(1)`] Probability of random transition to any of the neighboring states when taking any action. `...` [`any`] Arguments passed on to makeEnvironment.

## Details

A gridworld is an episodic navigation task, the goal is to get from start state to goal state.

Possible actions include going left, right, up or down. If `diagonal.moves = TRUE` diagonal moves are also possible, leftup, leftdown, rightup and rightdown.

When stepping into a cliff state you get a reward of `reward.cliff`, usually a high negative reward and transition to a state specified by `cliff.transition.states`.

In each column a deterministic wind specified via `wind` pushes you up a specific number of grid cells (for the next action).

A stochastic gridworld is a gridworld where with probability `stochasticity` the next state is chosen at random from all neighbor states independent of the actual action.

If an action would take you off the grid, the new state is the nearest cell inside the grid. For each step you get a reward of `reward.step`, until you reach a goal state, then the episode is done.

States are enumerated row-wise and numeration starts with 0. Here is an example 4x4 grid:

 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

So a board position could look like this (G: goal state, x: current state, C: cliff state):

 G o o o o o o o o x o o o o o C

## Usage

`makeEnvironment("gridworld", shape = NULL, goal.states = NULL, cliff.states = NULL, reward.step = -1, reward.cliff = -100, diagonal.moves = FALSE, wind = rep(0, shape), cliff.transition.states = NULL, cliff.transition.done = FALSE, stochasticity = 0, ...)`

## Methods

• `\$step(action)`
Take action in environment. Returns a list with `state`, `reward`, `done`.

• `\$reset()`
Resets the `done` flag of the environment and returns an initial state. Useful when starting a new episode.

• `\$visualize()`
Visualizes the environment (if there is a visualization function).

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ```# Gridworld Environment (Sutton & Barto Example 4.1) env1 = makeEnvironment("gridworld", shape = c(4L, 4L), goal.states = 0L, initial.state = 15L) env1\$reset() env1\$visualize() env1\$step(0L) env1\$visualize() # Windy Gridworld (Sutton & Barto Example 6.5) env2 = makeEnvironment("gridworld", shape = c(7, 10), goal.states = 37L, reward.step = -1, wind = c(0, 0, 0, 1, 1, 1, 2, 2, 1, 0), initial.state = 30L) # Cliff Walking (Sutton & Barto Example 6.6) env3 = makeEnvironment("gridworld", shape = c(4, 12), goal.states = 47L, cliff.states = 37:46, reward.step = -1, reward.cliff = -100, cliff.transition.states = 36L, initial.state = 36L) ```

### Example output

``` 15
- - - -
- - - -
- - - -
- - - o

\$state
 14

\$reward
 -1

\$done
 FALSE

- - - -
- - - -
- - - -
- - o -
```

reinforcelearn documentation built on May 2, 2019, 9:20 a.m.