sars: SARS Object

Description Usage Arguments Details Value Note Examples

Description

The function SARS() creates a SARS object for discrete-time Markov Decision Process (MDP) data.

Usage

1
SARS(states, actions, rewards, states_next, ids = NA)

Arguments

states

a numeric matrix for states, each row for each time step.

actions

a numeric matrix for actions.

rewards

a numeric column vector for rewards.

states_next

a numeric matrix for next states.

ids

a numeric column vector for ids.

Details

SARS stands for S (state), A (action), A (reward), and S' (next state), a basic unit of MDP.

SARS objects are designed to store more than one units. A typical use case is MDP trajectories of the form

S_1, A_1, R_1, S_2, A_2, R_2, …, S_n, A_n, R_n, S_{n+1}

which can be rearranged into units (S_1, A_1, R_1, S'_1=S_2), (S_2, A_2, R_2, S'_2=S_3), and so on. Elements across all units are then stacked together into matrices of states, actions, rewards, and states_next. For example, if each S is a p-vector, then state is a n-by-p matrix.

This structure is not a compact representation for trajectory use-case, because states_next would be a duplicate for 1 time step lagged states. However, it has compatibility over more than one trajectories: simply stacking matrices from different trajectories together. This single-matrix representation provides some computational advantages.

Value

a SARS object (class = "SARS")

Note

For 1D arguments (e.g. reward as a real number), a column vector (n-by-1 matrix) is expected.

Examples

1
2
3
4
5
6
states <- matrix(c(1, 2, 3, 4), 2, 2)
actions <- matrix(c(1, 0), 2, 1)
rewards <- matrix(c(1, 2), 2, 1)
states_next <- matrix(c(2, 3, 4, 5), 2, 2)
ss <- SARS(states, actions, rewards, states_next)
ss

XiaoqiLu/PhD-Thesis documentation built on March 1, 2021, 10:49 a.m.