Description Usage Arguments Details Value Note Examples

The function `SARS()`

creates a SARS object for discrete-time Markov Decision
Process (MDP) data.

1 |

`states` |
a numeric matrix for states, each row for each time step. |

`actions` |
a numeric matrix for actions. |

`rewards` |
a numeric column vector for rewards. |

`states_next` |
a numeric matrix for next states. |

`ids` |
a numeric column vector for ids. |

SARS stands for *S* (state), *A* (action), *A* (reward), and
*S'* (next state), a basic unit of MDP.

SARS objects are designed to store more than one units. A typical use case is MDP trajectories of the form

*S_1, A_1, R_1, S_2, A_2, R_2, …, S_n, A_n, R_n, S_{n+1}*

which can be rearranged into units *(S_1, A_1, R_1, S'_1=S_2)*, *(S_2, A_2, R_2, S'_2=S_3)*,
and so on. Elements across all units are then stacked together into matrices of
`states`

, `actions`

, `rewards`

, and `states_next`

. For example, if each *S*
is a *p*-vector, then `state`

is a *n*-by-*p* matrix.

This structure is not a compact representation for trajectory use-case, because
`states_next`

would be a duplicate for 1 time step lagged `states`

. However,
it has compatibility over more than one trajectories: simply stacking matrices
from different trajectories together. This single-matrix representation provides
some computational advantages.

a SARS object (`class = "SARS"`

)

For 1D arguments (e.g. reward as a real number), a column vector (*n*-by-*1* matrix)
is expected.

1 2 3 4 5 6 |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.