Description Usage Arguments Details Value Note Examples
The function SARS()
creates a SARS object for discrete-time Markov Decision
Process (MDP) data.
1 |
states |
a numeric matrix for states, each row for each time step. |
actions |
a numeric matrix for actions. |
rewards |
a numeric column vector for rewards. |
states_next |
a numeric matrix for next states. |
ids |
a numeric column vector for ids. |
SARS stands for S (state), A (action), A (reward), and S' (next state), a basic unit of MDP.
SARS objects are designed to store more than one units. A typical use case is MDP trajectories of the form
S_1, A_1, R_1, S_2, A_2, R_2, …, S_n, A_n, R_n, S_{n+1}
which can be rearranged into units (S_1, A_1, R_1, S'_1=S_2), (S_2, A_2, R_2, S'_2=S_3),
and so on. Elements across all units are then stacked together into matrices of
states
, actions
, rewards
, and states_next
. For example, if each S
is a p-vector, then state
is a n-by-p matrix.
This structure is not a compact representation for trajectory use-case, because
states_next
would be a duplicate for 1 time step lagged states
. However,
it has compatibility over more than one trajectories: simply stacking matrices
from different trajectories together. This single-matrix representation provides
some computational advantages.
a SARS object (class = "SARS"
)
For 1D arguments (e.g. reward as a real number), a column vector (n-by-1 matrix) is expected.
1 2 3 4 5 6 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.