Cliff_walking | R Documentation |
The cliff walking gridworld MDP example from Chapter 6 of the textbook "Reinforcement Learning: An Introduction."
An object of class MDP.
The cliff walking gridworld has the following layout:
The gridworld is represented as a 4 x 12 matrix of states.
The states are labeled with their x and y coordinates.
The start state is in the bottom left corner.
Each action has a reward of -1, falling off the cliff has a reward of -100 and
returns the agent back to the start. The episode is finished once the agent
reaches the absorbing goal state in the bottom right corner.
No discounting is used (i.e., \gamma = 1
).
Richard S. Sutton and Andrew G. Barto (2018). Reinforcement Learning: An Introduction Second Edition, MIT Press, Cambridge, MA.
Other MDP_examples:
MDP()
,
Maze
,
Windy_gridworld
Other gridworld:
Maze
,
Windy_gridworld
,
gridworld
data(Cliff_walking)
Cliff_walking
gridworld_matrix(Cliff_walking)
gridworld_matrix(Cliff_walking, what = "labels")
# The Goal is an absorbing state
which(absorbing_states(Cliff_walking))
# visualize the transition graph
gridworld_plot_transition_graph(Cliff_walking)
# solve using different methods
sol <- solve_MDP(Cliff_walking)
sol
policy(sol)
gridworld_plot_policy(sol)
sol <- solve_MDP(Cliff_walking, method = "q_learning", N = 100)
sol
policy(sol)
gridworld_plot_policy(sol)
sol <- solve_MDP(Cliff_walking, method = "sarsa", N = 100)
sol
policy(sol)
gridworld_plot_policy(sol)
sol <- solve_MDP(Cliff_walking, method = "expected_sarsa", N = 100, alpha = 1)
policy(sol)
gridworld_plot_policy(sol)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.