sars: SARS Object
In XiaoqiLu/PhD-Thesis: Regularized Q-Learning

Description Usage Arguments Details Value Note Examples

The function SARS() creates a SARS object for discrete-time Markov Decision Process (MDP) data.

1	SARS(states, actions, rewards, states_next, ids = NA)

`states`	a numeric matrix for states, each row for each time step.
`actions`	a numeric matrix for actions.
`rewards`	a numeric column vector for rewards.
`states_next`	a numeric matrix for next states.
`ids`	a numeric column vector for ids.

SARS stands for S (state), A (action), A (reward), and S' (next state), a basic unit of MDP.

SARS objects are designed to store more than one units. A typical use case is MDP trajectories of the form

S_1, A_1, R_1, S_2, A_2, R_2, …, S_n, A_n, R_n, S_{n+1}

which can be rearranged into units (S_1, A_1, R_1, S'_1=S_2), (S_2, A_2, R_2, S'_2=S_3), and so on. Elements across all units are then stacked together into matrices of states, actions, rewards, and states_next. For example, if each S is a p-vector, then state is a n-by-p matrix.

This structure is not a compact representation for trajectory use-case, because states_next would be a duplicate for 1 time step lagged states. However, it has compatibility over more than one trajectories: simply stacking matrices from different trajectories together. This single-matrix representation provides some computational advantages.

a SARS object (class = "SARS")

For 1D arguments (e.g. reward as a real number), a column vector (n-by-1 matrix) is expected.

states <- matrix(c(1, 2, 3, 4), 2, 2)
actions <- matrix(c(1, 0), 2, 1)
rewards <- matrix(c(1, 2), 2, 1)
states_next <- matrix(c(2, 3, 4, 5), 2, 2)
ss <- SARS(states, actions, rewards, states_next)
ss