MDP | R Documentation |
Defines all the elements of a MDP problem.
MDP(
states,
actions,
transition_prob,
reward,
discount = 0.9,
horizon = Inf,
start = "uniform",
name = NA
)
MDP2POMDP(x)
is_solved_MDP(x, stop = FALSE)
states |
a character vector specifying the names of the states. |
actions |
a character vector specifying the names of the available actions. |
transition_prob |
Specifies the transition probabilities between states. |
reward |
Specifies the rewards dependent on action, states and observations. |
discount |
numeric; discount rate between 0 and 1. |
horizon |
numeric; Number of epochs. |
start |
Specifies in which state the MDP starts. |
name |
a string to identify the MDP problem. |
x |
a |
stop |
logical; stop with an error. |
MDPs are similar to POMDPs, however, states are completely observable and
observations are not necessary. The model is defined similar to POMDP
models, but observations are not specified and the 'observations'
column in
the the reward specification is always '*'
.
MDP2POMDP()
reformulates a MDP as a POMDP with one observation per state
that reveals the current state. This is achieved by defining identity
observation probability matrices.
More details on specifying the model components can be found in the documentation for POMDP.
The function returns an object of class MDP which is list with
the model specification. solve_MDP()
reads the object and adds a list element called
'solution'
.
Michael Hahsler
Other MDP:
POMDP_accessors
,
simulate_MDP()
,
solve_MDP()
,
transition_graph()
# Michael's Sleepy Tiger Problem is like the POMDP Tiger problem, but
# has completely observable states because the tiger is sleeping in front
# of the door. This makes the problem an MDP.
STiger <- MDP(
name = "Michael's Sleepy Tiger Problem",
discount = .9,
states = c("tiger-left" , "tiger-right"),
actions = c("open-left", "open-right", "do-nothing"),
start = "uniform",
# opening a door resets the problem
transition_prob = list(
"open-left" = "uniform",
"open-right" = "uniform",
"do-nothing" = "identity"),
# the reward helper R_() expects: action, start.state, end.state, observation, value
reward = rbind(
R_("open-left", "tiger-left", v = -100),
R_("open-left", "tiger-right", v = 10),
R_("open-right", "tiger-left", v = 10),
R_("open-right", "tiger-right", v = -100),
R_("do-nothing", v = 0)
)
)
STiger
sol <- solve_MDP(STiger, eps = 1e-7)
sol
policy(sol)
plot_value_function(sol)
# convert the MDP into a POMDP and solve
STiger_POMDP <- MDP2POMDP(STiger)
sol2 <- solve_POMDP(STiger_POMDP)
sol2
policy(sol2)
plot_value_function(sol2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.