POMDP_accessors: Access to Parts of the POMDP Description
In pomdp: Infrastructure for Partially Observable Markov Decision Processes (POMDP)

POMDP_accessors

R Documentation

Access to Parts of the POMDP Description

Description

Functions to provide uniform access to different parts of the POMDP description.

Usage

transition_matrix(
  x,
  action = NULL,
  episode = NULL,
  epoch = NULL,
  sparse = TRUE,
  drop = TRUE
)

transition_val(x, action, start.state, end.state, episode = NULL, epoch = NULL)

observation_matrix(
  x,
  action = NULL,
  episode = NULL,
  epoch = NULL,
  sparse = TRUE,
  drop = TRUE
)

observation_val(
  x,
  action,
  end.state,
  observation,
  episode = NULL,
  epoch = NULL
)

reward_matrix(
  x,
  action = NULL,
  start.state = NULL,
  episode = NULL,
  epoch = NULL,
  sparse = FALSE,
  drop = TRUE
)

reward_val(
  x,
  action,
  start.state,
  end.state = NA,
  observation = NA,
  episode = NULL,
  epoch = NULL
)

start_vector(x)

normalize_POMDP(x, sparse = TRUE)

normalize_MDP(x, sparse = TRUE)

Arguments

`x`	A POMDP or MDP object.
`action`	name or index of an action.
`episode, epoch`	Episode or epoch used for time-dependent POMDPs. Epochs are internally converted to the episode using the model horizon.
`sparse`	logical; use sparse matrices when the density is below 50% and keeps data.frame representation for the reward field. `NULL` returns the representation stored in the problem description which saves the time for conversion.
`drop`	logical; drop the action list if a single action is requested?
`start.state, end.state`	name or index of the state.
`observation`	name or index of observation.

Details

Several parts of the POMDP description can be defined in different ways. In particular, the fields transition_prob, observation_prob, reward, and start can be defined using matrices, data frames or keywords. See POMDP for details. The functions provided here, provide unified access to the data in these fields to make writing code easier.

Transition Probabilities `T(s'|s,a)`

transition_matrix() returns a list with one element for each action. Each element contains a states x states matrix with s (start.state) as rows and s' (end.state) as columns. Matrices with a density below 50% can be requested in sparse format (as a Matrix::dgCMatrix)

transition_val() retrieves a single entry more efficiently.

Observation Probabilities `O(o|s',a)`

observation_matrix() returns a list with one element for each action. Each element contains a states x states matrix with s (start.state) as rows and s' (end.state) as columns. Matrices with a density below 50% can be requested in sparse format (as a Matrix::dgCMatrix)

observation_val() retrieves a single entry more efficiently.

Reward `R(s,s',o,a)`

reward_matrix() returns for the dense representation a list of lists. The list levels are a (action) and s (start.state). The list elements are matrices with rows representing the end state s' and columns representing observations o. Many reward structures cannot be efficiently stored using a standard sparse matrix since there might be a fixed cost for each action resulting in no entries with 0. Therefore, the data.frame representation is used as a 'sparse' representation.

observation_val() retrieves a single entry more efficiently.

Initial Belief

start_vector() translates the initial probability vector description into a numeric vector.

Convert the Complete POMDP Description into a Consistent Form

normalize_POMDP() returns a new POMDP definition where transition_prob, observations_prob, reward, and start are normalized to (lists of) matrices and vectors to make direct access easy. Also, states, actions, and observations are ordered as given in the problem definition to make safe access using numerical indices possible. Normalized POMDP descriptions are used for C++ based code (e.g., simulate_POMDP()) and normalizing them once will save time if the code is called repeatedly.

Value

A list or a list of lists of matrices.

Author(s)

Michael Hahsler

Examples

data("Tiger")

# List of |A| transition matrices. One per action in the from start.states x end.states
Tiger$transition_prob
transition_matrix(Tiger)
transition_val(Tiger, action = "listen", start.state = "tiger-left", end.state = "tiger-left")

# List of |A| observation matrices. One per action in the from states x observations
Tiger$observation_prob
observation_matrix(Tiger)
observation_val(Tiger, action = "listen", end.state = "tiger-left", observation = "tiger-left")

# List of list of reward matrices. 1st level is action and second level is the
#  start state in the form end state x observation
Tiger$reward
reward_matrix(Tiger)
reward_val(Tiger, action = "open-right", start.state = "tiger-left", end.state = "tiger-left",
  observation = "tiger-left")
  
# Note that the reward in the tiger problem only depends on the action and the start.state 
# so we can use:
reward_val(Tiger, action = "open-right", start.state = "tiger-left")

# Translate the initial belief vector
Tiger$start
start_vector(Tiger)

# Normalize the whole model
Tiger_norm <- normalize_POMDP(Tiger)
Tiger_norm$transition_prob

## Visualize transition matrix for action 'open-left'
library("igraph")
g <- graph_from_adjacency_matrix(transition_matrix(Tiger, action = "open-left"), weighted = TRUE)
edge_attr(g, "label") <- edge_attr(g, "weight")

igraph.options("edge.curved" = TRUE)
plot(g, layout = layout_on_grid, main = "Transitions for action 'open=left'")

## Use a function for the Tiger transition model
trans <- function(action, end.state, start.state) {
  ## listen has an identity matrix
  if (action == 'listen')
    if (end.state == start.state) return(1)
    else return(0)

  # other actions have a uniform distribution
  return(1/2)
}

Tiger$transition_prob <- trans

# transition_matrix evaluates the function
transition_matrix(Tiger)

pomdp documentation built on Sept. 9, 2023, 1:07 a.m.

pomdp index

Package overview README.md POMDP: Introduction to Partially Observable Markov Decision Processes

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

pomdp
Infrastructure for Partially Observable Markov Decision Processes (POMDP)

POMDP_accessors: Access to Parts of the POMDP Description
In pomdp: Infrastructure for Partially Observable Markov Decision Processes (POMDP)

Access to Parts of the POMDP Description

Description

Usage

Arguments

Details

Transition Probabilities `T(s'|s,a)`

Observation Probabilities `O(o|s',a)`

Reward `R(s,s',o,a)`

Initial Belief

Convert the Complete POMDP Description into a Consistent Form

Value

Author(s)

See Also

Examples

Related to POMDP_accessors in pomdp...

R Package Documentation

Browse R Packages

We want your feedback!

pomdp Infrastructure for Partially Observable Markov Decision Processes (POMDP)

POMDP_accessors: Access to Parts of the POMDP Description In pomdp: Infrastructure for Partially Observable Markov Decision Processes (POMDP)

Access to Parts of the POMDP Description

Description

Usage

Arguments

Details

Transition Probabilities T(s'|s,a)

Observation Probabilities O(o|s',a)

Reward R(s,s',o,a)

Initial Belief

Convert the Complete POMDP Description into a Consistent Form

Value

Author(s)

See Also

Examples

Related to POMDP_accessors in pomdp...

R Package Documentation

Browse R Packages

We want your feedback!

pomdp
Infrastructure for Partially Observable Markov Decision Processes (POMDP)

POMDP_accessors: Access to Parts of the POMDP Description
In pomdp: Infrastructure for Partially Observable Markov Decision Processes (POMDP)

Transition Probabilities `T(s'|s,a)`

Observation Probabilities `O(o|s',a)`

Reward `R(s,s',o,a)`