MdpEnvironment: MDP Environment

Description Arguments Usage Methods Examples

Description

Markov Decision Process environment.

Arguments

transitions

[array (n.states x n.states x n.actions)]
State transition array.

rewards

[matrix (n.states x n.actions)]
Reward array.

initial.state

[integer]
Optional starting state. If a vector is given a starting state will be randomly sampled from this vector whenever reset is called. Note that states are numerated starting with 0. If initial.state = NULL all non-terminal states are possible starting states.

...

[any]
Arguments passed on to makeEnvironment.

Usage

makeEnvironment("MDP", transitions, rewards, initial.state, ...)

Methods

Examples

1
2
3
4
5
6
7
8
# Create a Markov Decision Process.
P = array(0, c(2, 2, 2))
P[, , 1] = matrix(c(0.5, 0.5, 0, 1), 2, 2, byrow = TRUE)
P[, , 2] = matrix(c(0, 1, 0, 1), 2, 2, byrow = TRUE)
R = matrix(c(5, 10, -1, 2), 2, 2, byrow = TRUE)
env = makeEnvironment("mdp", transitions = P, rewards = R)
env$reset()
env$step(1L)

Example output

[1] 0
$state
[1] 1

$reward
[1] 10

$done
[1] TRUE

reinforcelearn documentation built on May 2, 2019, 9:20 a.m.