# sars: SARS Object In XiaoqiLu/PhD-Thesis: Regularized Q-Learning

## Description

The function SARS() creates a SARS object for discrete-time Markov Decision Process (MDP) data.

## Usage

 1 SARS(states, actions, rewards, states_next, ids = NA)

## Arguments

 states a numeric matrix for states, each row for each time step. actions a numeric matrix for actions. rewards a numeric column vector for rewards. states_next a numeric matrix for next states. ids a numeric column vector for ids.

## Details

SARS stands for S (state), A (action), A (reward), and S' (next state), a basic unit of MDP.

SARS objects are designed to store more than one units. A typical use case is MDP trajectories of the form

S_1, A_1, R_1, S_2, A_2, R_2, …, S_n, A_n, R_n, S_{n+1}

which can be rearranged into units (S_1, A_1, R_1, S'_1=S_2), (S_2, A_2, R_2, S'_2=S_3), and so on. Elements across all units are then stacked together into matrices of states, actions, rewards, and states_next. For example, if each S is a p-vector, then state is a n-by-p matrix.

This structure is not a compact representation for trajectory use-case, because states_next would be a duplicate for 1 time step lagged states. However, it has compatibility over more than one trajectories: simply stacking matrices from different trajectories together. This single-matrix representation provides some computational advantages.

## Value

a SARS object (class = "SARS")

## Note

For 1D arguments (e.g. reward as a real number), a column vector (n-by-1 matrix) is expected.

## Examples

 1 2 3 4 5 6 states <- matrix(c(1, 2, 3, 4), 2, 2) actions <- matrix(c(1, 0), 2, 1) rewards <- matrix(c(1, 2), 2, 1) states_next <- matrix(c(2, 3, 4, 5), 2, 2) ss <- SARS(states, actions, rewards, states_next) ss

XiaoqiLu/PhD-Thesis documentation built on March 1, 2021, 10:49 a.m.