gridworld: Gridworld
In markdumke/reinforcelearn: Reinforcement Learning

Gridworld

R Documentation

Gridworld

Description

Creates gridworld environments.

Arguments

`shape`	[`integer(2)`] Shape of the gridworld (number of rows x number of columns).
`goal.states`	[`integer`] Goal states in the gridworld.
`cliff.states`	[`integer`] Cliff states in the gridworld.
`reward.step`	[`integer(1)`] Reward for taking a step.
`cliff.transition.states`	[`integer`] States to which the environment transitions if stepping into the cliff. If it is a vector, all states will have equal probability. Only used when `cliff.transition.done == FALSE`, else specify the `initial.state` argument.
`reward.cliff`	[`integer(1)`] Reward for taking a step in the cliff state.
`diagonal.moves`	[`logical(1)`] Should diagonal moves be allowed?
`wind`	[`integer`] Strength of the upward wind in each cell.
`cliff.transition.done`	[`logical(1)`] Should the episode end after stepping into the cliff?
`stochasticity`	[`numeric(1)`] Probability of random transition to any of the neighboring states when taking any action.
`...`	[`any`] Arguments passed on to makeEnvironment.

Details

A gridworld is an episodic navigation task, the goal is to get from start state to goal state.

Possible actions include going left, right, up or down. If diagonal.moves = TRUE diagonal moves are also possible, leftup, leftdown, rightup and rightdown.

When stepping into a cliff state you get a reward of reward.cliff, usually a high negative reward and transition to a state specified by cliff.transition.states.

In each column a deterministic wind specified via wind pushes you up a specific number of grid cells (for the next action).

A stochastic gridworld is a gridworld where with probability stochasticity the next state is chosen at random from all neighbor states independent of the actual action.

If an action would take you off the grid, the new state is the nearest cell inside the grid. For each step you get a reward of reward.step, until you reach a goal state, then the episode is done.

States are enumerated row-wise and numeration starts with 0. Here is an example 4x4 grid:

0	1	2	3
4	5	6	7
8	9	10	11
12	13	14	15

So a board position could look like this (G: goal state, x: current state, C: cliff state):

G	o	o	o
o	o	o	o
o	x	o	o
o	o	o	C

Usage

makeEnvironment("gridworld", shape = NULL, goal.states = NULL, cliff.states = NULL, reward.step = -1, reward.cliff = -100, diagonal.moves = FALSE, wind = rep(0, shape[2]), cliff.transition.states = NULL, cliff.transition.done = FALSE, stochasticity = 0, ...)

Methods

$step(action)
Take action in environment. Returns a list with state, reward, done.
$reset()
Resets the done flag of the environment and returns an initial state. Useful when starting a new episode.
$visualize()
Visualizes the environment (if there is a visualization function).

Examples

# Gridworld Environment (Sutton & Barto Example 4.1)
env1 = makeEnvironment("gridworld", shape = c(4L, 4L), goal.states = 0L,
  initial.state = 15L)
env1$reset()
env1$visualize()
env1$step(0L)
env1$visualize()

# Windy Gridworld (Sutton & Barto Example 6.5)
env2 = makeEnvironment("gridworld", shape = c(7, 10), goal.states = 37L,
  reward.step = -1, wind = c(0, 0, 0, 1, 1, 1, 2, 2, 1, 0),
  initial.state = 30L)

# Cliff Walking (Sutton & Barto Example 6.6)
env3 = makeEnvironment("gridworld", shape = c(4, 12), goal.states = 47L,
  cliff.states = 37:46, reward.step = -1, reward.cliff = -100,
  cliff.transition.states = 36L, initial.state = 36L)

markdumke/reinforcelearn documentation built on Nov. 17, 2022, 12:53 a.m.