accessors | R Documentation |
Functions to provide uniform access to different parts of the POMDP/MDP problem description.
start_vector(x)
normalize_POMDP(
x,
sparse = TRUE,
trans_start = FALSE,
trans_function = TRUE,
trans_keyword = FALSE
)
normalize_MDP(
x,
sparse = TRUE,
trans_start = FALSE,
trans_function = TRUE,
trans_keyword = FALSE
)
reward_matrix(
x,
action = NULL,
start.state = NULL,
end.state = NULL,
observation = NULL,
episode = NULL,
epoch = NULL,
sparse = FALSE
)
reward_val(
x,
action,
start.state,
end.state = NULL,
observation = NULL,
episode = NULL,
epoch = NULL
)
transition_matrix(
x,
action = NULL,
start.state = NULL,
end.state = NULL,
episode = NULL,
epoch = NULL,
sparse = FALSE,
trans_keyword = TRUE
)
transition_val(x, action, start.state, end.state, episode = NULL, epoch = NULL)
observation_matrix(
x,
action = NULL,
end.state = NULL,
observation = NULL,
episode = NULL,
epoch = NULL,
sparse = FALSE,
trans_keyword = TRUE
)
observation_val(
x,
action,
end.state,
observation,
episode = NULL,
epoch = NULL
)
x |
A POMDP or MDP object. |
sparse |
logical; use sparse matrices when the density is below 50% and keeps data.frame representation
for the reward field. |
trans_start |
logical; expand the start to a probability vector? |
trans_function |
logical; convert functions into matrices? |
trans_keyword |
logical; convert distribution keywords (uniform and identity)
in |
action |
name or index of an action. |
start.state , end.state |
name or index of the state. |
observation |
name or index of observation. |
episode , epoch |
Episode or epoch used for time-dependent POMDPs. Epochs are internally converted to the episode using the model horizon. |
Several parts of the POMDP/MDP description can be defined in different ways. In particular,
the fields transition_prob
, observation_prob
, reward
, and start
can be defined using matrices, data frames,
keywords, or functions. See POMDP for details. The functions provided here, provide unified access to the data in these fields
to make writing code easier.
T(s'|s,a)
transition_matrix()
accesses the transition model. The complete model
is a list with one element for each action. Each element contains a states x states matrix
with s
(start.state
) as rows and s'
(end.state
) as columns.
Matrices with a density below 50% can be requested in sparse format
(as a Matrix::dgCMatrix).
O(o|s',a)
observation_matrix()
accesses the observation model. The complete model is a
list with one element for each action. Each element contains a states x observations matrix
with s
(start.state
) as rows and o
(observation
) as columns.
Matrices with a density below 50% can be requested in sparse format
(as a Matrix::dgCMatrix)
R(s,s',o,a)
reward_matrix()
accesses the reward model.
The preferred representation is a data.frame with the
columns action
, start.state
, end.state
,
observation
, and value
. This is a sparse representation.
The dense representation is a list of lists of matrices.
The list levels are a
(action
) and s
(start.state
).
The matrices have rows representing s'
(end.state
)
and columns representing o
(observations
).
The reward structure cannot be efficiently stored using a standard sparse matrix
since there might be a fixed cost for each action
resulting in no entries with 0.
start_vector()
translates the initial probability vector description into a numeric vector.
normalize_POMDP()
returns a new POMDP definition where transition_prob
,
observations_prob
, reward
, and start
are normalized.
Also, states
, actions
, and observations
are ordered as given in the problem
definition to make safe access using numerical indices possible. Normalized POMDP descriptions can be
used in custom code that expects consistently a certain format.
A list or a list of lists of matrices.
Michael Hahsler
Other POMDP:
MDP2POMDP
,
POMDP()
,
actions()
,
add_policy()
,
plot_belief_space()
,
projection()
,
reachable_and_absorbing
,
regret()
,
sample_belief_space()
,
simulate_POMDP()
,
solve_POMDP()
,
solve_SARSOP()
,
transition_graph()
,
update_belief()
,
value_function()
,
write_POMDP()
Other MDP:
MDP()
,
MDP2POMDP
,
MDP_policy_functions
,
actions()
,
add_policy()
,
gridworld
,
reachable_and_absorbing
,
regret()
,
simulate_MDP()
,
solve_MDP()
,
transition_graph()
,
value_function()
data("Tiger")
# List of |A| transition matrices. One per action in the from start.states x end.states
Tiger$transition_prob
transition_matrix(Tiger)
transition_val(Tiger, action = "listen", start.state = "tiger-left", end.state = "tiger-left")
# List of |A| observation matrices. One per action in the from states x observations
Tiger$observation_prob
observation_matrix(Tiger)
observation_val(Tiger, action = "listen", end.state = "tiger-left", observation = "tiger-left")
# List of list of reward matrices. 1st level is action and second level is the
# start state in the form end state x observation
Tiger$reward
reward_matrix(Tiger)
reward_matrix(Tiger, sparse = TRUE)
reward_matrix(Tiger, action = "open-right", start.state = "tiger-left", end.state = "tiger-left",
observation = "tiger-left")
# Translate the initial belief vector
Tiger$start
start_vector(Tiger)
# Normalize the whole model
Tiger_norm <- normalize_POMDP(Tiger)
Tiger_norm$transition_prob
## Visualize transition matrix for action 'open-left'
plot_transition_graph(Tiger)
## Use a function for the Tiger transition model
trans <- function(action, end.state, start.state) {
## listen has an identity matrix
if (action == 'listen')
if (end.state == start.state) return(1)
else return(0)
# other actions have a uniform distribution
return(1/2)
}
Tiger$transition_prob <- trans
# transition_matrix evaluates the function
transition_matrix(Tiger)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.