Solves finite-horizon MDP using backwards induction algorithm

Share:

Description

Solves finite-horizon MDP with backwards induction algorithm

Usage

1
mdp_finite_horizon(P, R, discount, N, h)

Arguments

P

transition probability array. P can be a 3 dimensions array [S,S,A] or a list [[A]], each element containing a sparse matrix [S,S].

R

reward array. R can be a 3 dimensions array [S,S,A] or a list [[A]], each element containing a sparse matrix [S,S] or a 2 dimensional matrix [S,A] possibly sparse.

discount

discount factor. discount is a real number which belongs to [0; 1[.

N

number of stages. N is an integer greater than 0.

h

(optional) terminal reward. h is a S length vector. By default, h = numeric(S).

Details

mdp_finite_horizon applies backwards induction algorithm for finite-horizon MDP. The optimality equations allow to recursively evaluate function values starting from the terminal stage. This function uses verbose and silent modes. In verbose mode, the function displays the current stage and the corresponding optimal policy.

Value

V

value fonction. V is a [S,(N+1)] matrix. Each column n is the optimal value fonction at stage n, with n = 1, ... N. V[,N+1] is the terminal reward.

policy

optimal policy. policy is a [S,N] matrix. Each element is an integer correspond- ing to an action and each column n is the optimal policy at stage n.

cpu_time

CPU time used to run the program

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# With a non-sparse matrix
P <- array(0, c(2,2,2))
P[,,1] <- matrix(c(0.5, 0.5, 0.8, 0.2), 2, 2, byrow=TRUE)
P[,,2] <- matrix(c(0, 1, 0.1, 0.9), 2, 2, byrow=TRUE)
R <- matrix(c(5, 10, -1, 2), 2, 2, byrow=TRUE)
mdp_finite_horizon(P, R, 0.9, 3)

# With a sparse matrix
P <- list()
P[[1]] <- Matrix(c(0.5, 0.5, 0.8, 0.2), 2, 2, byrow=TRUE, sparse=TRUE)
P[[2]] <- Matrix(c(0, 1, 0.1, 0.9), 2, 2, byrow=TRUE, sparse=TRUE)
mdp_finite_horizon(P, R, 0.9, 3)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.