MdpEnvironment: MDP Environment
In markdumke/reinforcelearn: Reinforcement Learning

MdpEnvironment

R Documentation

MDP Environment

Description

Markov Decision Process environment.

Arguments

`transitions`	[`array (n.states x n.states x n.actions)`] State transition array.
`rewards`	[`matrix (n.states x n.actions)`] Reward array.
`initial.state`	[`integer`] Optional starting state. If a vector is given a starting state will be randomly sampled from this vector whenever `reset` is called. Note that states are numerated starting with 0. If `initial.state = NULL` all non-terminal states are possible starting states.
`...`	[`any`] Arguments passed on to makeEnvironment.

Usage

makeEnvironment("MDP", transitions, rewards, initial.state, ...)

Methods

$step(action)
Take action in environment. Returns a list with state, reward, done.
$reset()
Resets the done flag of the environment and returns an initial state. Useful when starting a new episode.
$visualize()
Visualizes the environment (if there is a visualization function).

Examples

# Create a Markov Decision Process.
P = array(0, c(2, 2, 2))
P[, , 1] = matrix(c(0.5, 0.5, 0, 1), 2, 2, byrow = TRUE)
P[, , 2] = matrix(c(0, 1, 0, 1), 2, 2, byrow = TRUE)
R = matrix(c(5, 10, -1, 2), 2, 2, byrow = TRUE)
env = makeEnvironment("mdp", transitions = P, rewards = R)
env$reset()
env$step(1L)

markdumke/reinforcelearn documentation built on Nov. 17, 2022, 12:53 a.m.