Gridworld | R Documentation |
Creates gridworld environments.
shape |
[ |
goal.states |
[ |
cliff.states |
[ |
reward.step |
[ |
cliff.transition.states |
[ |
reward.cliff |
[ |
diagonal.moves |
[ |
wind |
[ |
cliff.transition.done |
[ |
stochasticity |
[ |
... |
[ |
A gridworld is an episodic navigation task, the goal is to get from start state to goal state.
Possible actions include going left, right, up or down. If diagonal.moves = TRUE
diagonal
moves are also possible, leftup, leftdown, rightup and rightdown.
When stepping into a cliff state you get a reward of reward.cliff
,
usually a high negative reward and transition to a state specified by cliff.transition.states
.
In each column a deterministic wind specified via wind
pushes you up a specific number of
grid cells (for the next action).
A stochastic gridworld is a gridworld where with probability stochasticity
the next state
is chosen at random from all neighbor states independent of the actual action.
If an action would take you off the grid, the new state is the nearest cell inside the grid.
For each step you get a reward of reward.step
, until you reach a goal state,
then the episode is done.
States are enumerated row-wise and numeration starts with 0. Here is an example 4x4 grid:
0 | 1 | 2 | 3 |
4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 |
So a board position could look like this (G: goal state, x: current state, C: cliff state):
G | o | o | o |
o | o | o | o |
o | x | o | o |
o | o | o | C |
makeEnvironment("gridworld", shape = NULL, goal.states = NULL, cliff.states = NULL, reward.step = -1, reward.cliff = -100, diagonal.moves = FALSE, wind = rep(0, shape[2]), cliff.transition.states = NULL, cliff.transition.done = FALSE, stochasticity = 0, ...)
$step(action)
Take action in environment.
Returns a list with state
, reward
, done
.
$reset()
Resets the done
flag of the environment and returns an initial state.
Useful when starting a new episode.
$visualize()
Visualizes the environment (if there is a visualization function).
# Gridworld Environment (Sutton & Barto Example 4.1) env1 = makeEnvironment("gridworld", shape = c(4L, 4L), goal.states = 0L, initial.state = 15L) env1$reset() env1$visualize() env1$step(0L) env1$visualize() # Windy Gridworld (Sutton & Barto Example 6.5) env2 = makeEnvironment("gridworld", shape = c(7, 10), goal.states = 37L, reward.step = -1, wind = c(0, 0, 0, 1, 1, 1, 2, 2, 1, 0), initial.state = 30L) # Cliff Walking (Sutton & Barto Example 6.6) env3 = makeEnvironment("gridworld", shape = c(4, 12), goal.states = 47L, cliff.states = 37:46, reward.step = -1, reward.cliff = -100, cliff.transition.states = 36L, initial.state = 36L)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.