MDP: Define an MDP Problem

View source: R/MDP.R

MDPR Documentation

Define an MDP Problem

Description

Defines all the elements of a MDP problem.

Usage

MDP(
  states,
  actions,
  transition_prob,
  reward,
  discount = 0.9,
  horizon = Inf,
  start = "uniform",
  name = NA
)

MDP2POMDP(x)

is_solved_MDP(x, stop = FALSE)

Arguments

states

a character vector specifying the names of the states.

actions

a character vector specifying the names of the available actions.

transition_prob

Specifies the transition probabilities between states.

reward

Specifies the rewards dependent on action, states and observations.

discount

numeric; discount rate between 0 and 1.

horizon

numeric; Number of epochs. Inf specifies an infinite horizon.

start

Specifies in which state the MDP starts.

name

a string to identify the MDP problem.

x

a MDP object.

stop

logical; stop with an error.

Details

MDPs are similar to POMDPs, however, states are completely observable and observations are not necessary. The model is defined similar to POMDP models, but observations are not specified and the 'observations' column in the the reward specification is always '*'.

MDP2POMDP() reformulates a MDP as a POMDP with one observation per state that reveals the current state. This is achieved by defining identity observation probability matrices.

More details on specifying the model components can be found in the documentation for POMDP.

Value

The function returns an object of class MDP which is list with the model specification. solve_MDP() reads the object and adds a list element called 'solution'.

Author(s)

Michael Hahsler

See Also

Other MDP: POMDP_accessors, simulate_MDP(), solve_MDP(), transition_graph()

Examples

# Michael's Sleepy Tiger Problem is like the POMDP Tiger problem, but
# has completely observable states because the tiger is sleeping in front
# of the door. This makes the problem an MDP.

STiger <- MDP(
  name = "Michael's Sleepy Tiger Problem",
  discount = .9,

  states = c("tiger-left" , "tiger-right"),
  actions = c("open-left", "open-right", "do-nothing"),
  start = "uniform",

  # opening a door resets the problem
  transition_prob = list(
    "open-left" =  "uniform",
    "open-right" = "uniform",
    "do-nothing" = "identity"),

  # the reward helper R_() expects: action, start.state, end.state, observation, value
  reward = rbind(
    R_("open-left",  "tiger-left",  v = -100),
    R_("open-left",  "tiger-right", v =   10),
    R_("open-right", "tiger-left",  v =   10),
    R_("open-right", "tiger-right", v = -100),
    R_("do-nothing",                v =    0)
  )
)

STiger

sol <- solve_MDP(STiger, eps = 1e-7)
sol

policy(sol)
plot_value_function(sol)

# convert the MDP into a POMDP and solve
STiger_POMDP <- MDP2POMDP(STiger)
sol2 <- solve_POMDP(STiger_POMDP)
sol2 

policy(sol2)
plot_value_function(sol2)

pomdp documentation built on Sept. 9, 2023, 1:07 a.m.