require(multipleuncertainty) require(pdgControl) require(reshape2) require(ggplot2) require(data.table) ```` ```r opts_knit$set(upload.fun = socialR::flickr.url) opts_chunk$set(dev.args=list(bg="transparent"), cache=TRUE, comment=NA, message=FALSE, warning=FALSE, tidy=FALSE) opts_chunk$set(fig.path = paste("figure/", gsub(".Rmd", "", knitr:::knit_concord$get('infile')), "-", sep=""), cache.path = paste(gsub(".Rmd", "", knitr:::knit_concord$get('infile') ), "/", sep="") ) require(ggplot2) theme_notebook <- theme_grey() + theme(plot.background = element_rect(fill = "transparent", colour = NA), panel.grid.minor = element_line(colour = "transparent")) theme_set(theme_notebook)
Based on Powell 2006, page 97. I try to conform to that notation throughout
First we define the Beverton-Holt stock-recruitment relationship as a function of stock size x
, harvest h
and parameters p
``` {r bevholt} f <- function(x, h, p){ A <- p[1] B <- p[2] s <- pmax(x-h, 0) A * s/(1 + B * s) } p <- pars <- c(1.5, 0.5) K <- (p[1] - 1)/p[2] sigma_g <- 0.2
We begin with a simulation method $X_{t+1} = f(X_t, Z_t)$. For illustration, let us consider $f(X_t, Z_t) = Z_t \frac{a X_t}{b + X_t}$ with a = `r pars[1]` and b = `r pars[2]`. We define a statespace $S$ ```r S <- seq(0, 1, length=11)
as a uniform grid of r length(S)
points from r min(S)
to r max(S)
.
We also need a value function on the state space, $C_t(S_t)$.
For simplicity, we set the price of harvest at unity and
the cost of harvesting at zero, so that $C_t(S_t, x_t) = \min(x_t, S_t)$.
($C_t$ is sometimes denoted $\mathbb{\Pi}$).
We also need an action space $\chi_t$ of possible harvest values.
Again for simplicity we assume that harvest can be set to any possible state size, $\chi_t \equiv S_t$,
chi <- S
T <- 10 N <- 10
The approximate dynamic programming algorithm will perform a finite number $N$ = r N
iterations over a window of time $T$ =r T
in our example. The algorithm can then be described as follows:
Step 0
Initialize some value $\tilde V_t^0(S_t)$ for all states $S_t$,
where the superscripts denote iterations in the forward approximation.
As we know absolutely nothing yet to base our initial guess on,
we just arbitrarily set this to zero.
V <- numeric(length(S))
S_0 <- 0.5
Set $n = 1$
Step 1: Choose a sample path, $\omega^n$ (a vector of random draws)
sigma <- 0.2 omega_n <- rlnorm(T, 0, sigma)
Step 2: For $t = 0, 1, 2, \ldots, T$, do:
Solve:
$$V_t(S_t) = \max_{x_t \in \chi_t} \left(C(S_t, x_t) + \gamma \sum_{s^{\prime} \in \mathcal{S}} \mathbb{P}(s^{\prime} | S_t^n, x_t) V_{t+1}^{n-1} s^{\prime} \right)$$
That is, choose action $x_t$ that maximizes the value of the next step.
Let's start with $t=0$, $n=1$ and fix an $x_0$ from the set of $\chi$ (allowing the action space to be the same in each period, we can omit the subscript on $\chi$) to get started. We first compute $C(S_0, x_0)$.
$S_0 = S_0^1$ which we fixed in step 0b arbitrarily at r S_0
.
The profits/costs $C(S_t, x_t)$ are the value derived by action (harvest) $x_t$ at state (stock) $S_t$. Assuming a fixed price and no costs to harvesting, this is just whichever number is smaller (since we cannot harvest more than the available stock,
C <- function(S, X) pmin(S, X)
(where we have used R's vectorized form of the min function).
This forward dynamic programming will still rely on the one-step transition matrix, $\mathbb{P}$.
Let's get the trasition matrices for this problem, assuming log-normal noise,
sdp_matrix <- determine_SDP_matrix(f, p, x_grid=S, h_grid=chi, sigma_g)
Which is a list of matrices, one for each harvest (action) $x_t$.
Then we want to consider a fixed $S_t^n$ and fixed $x_t$, and take the sum of
$\mathbb{P}(s^{\prime} | S_t^n, x_t)$ over the $s^{\prime}$, which means we want
the $x_t$ element from the list, and then we need sum over the distribution of future
states given the current state $S_t^n$, e.g. a row of the matrix, e.g. sdp_matrix[[x]][s,]
,
which we (vector) multiply by $V_{t+1}^{n-1}(s^{\prime})$.
This value $V$ is of course unknown, other than our initial random guess $V_{t}^0$.
As we step through the iterations $V_t^1$, $V_t^2$, $V_t^3$, etc., this should convgerge to
something meaningful.
Note that the index along $S$ corresponding to $S_t^n$ is given by
s <- which.min(abs(S-S_0))
So our maximization across $x$ just involves:
values <- sapply(1:length(chi), function(x) C(S[s], chi[x]) + sdp_matrix[[x]][s,] %*% V ) max_x <- which.max(values) v_hat <- max(values)
Trivially, this is just the harvest level that maximizes $C$ so far (which is just harvesting the $S_0$, since $\bar V^0_t$ begins at zero:
chi[max_x]
$$V_t^n(S_t) = \begin{cases} \hat v_t^n & S_t = S_t^n \ \bar V_t^{n-1}(S_t) & \textrm{otherwise} \end{cases}$$
e.g. use our maximum value for the case of the state we just considered $S_t = S_t^n$, otherwise leave $V_t$ unchanged. Our new $V$ is thus:
V[s] = v_hat
We compute the next state using our max_x
for $x^n_t$, our random samples
and the transition function...
S_1 <- omega_n[1] * f(S_0, chi[x_nt], p)
assembling the solution...
Things we want to keep track of:
We have time, iterations, and state/action space to keep track over. eek.
N <- 5000 # iterations M <- 20 # gridsize Tmax <- 5 # Time horizon gamma <- 0.95 # Discount # f # p sigma_g <- 0.5 # larger variation in random draws helps chi <- seq(0, 1, length.out = M) S <- seq(0, 1, length.out = M) sdp_matrix <- determine_SDP_matrix(f, p, x_grid=S, h_grid=chi, sigma_g) V <- matrix(1, M, Tmax) # A* strategy # Fails to explore at matrix(0, M, Tmax) # consider: # V <- matrix(rep(chi, Tmax), nrow=M) # # V[,1] <- chi # fails to explore if it doesn't have at least some non-zero values C <- function(S, X) pmin(S, X) S_0 <- 0.5 alpha <- 1 # learning rate for(n in 1:N){ omega_n <- rlnorm(Tmax, 0, sigma_g) S_current <- S_0 #runif(1,0,1) # explores faster when this is random for(t in 1:Tmax){ # index of the state we're considering s <- which.min(abs(S-S_current)) # Find the action maximizing the value values <- sapply(1:length(chi), function(x) C(S[s], chi[x]) + gamma * sdp_matrix[[x]][s,] %*% V[,t]) hat <- c(x_nt = which.max(values), v_nt = max(values)) # Update value V as mixture of new value and previous value V[hat["x_nt"], t] <- (1 - alpha) * V[hat["x_nt"],t] + alpha * hat["v_nt"] # Advance the state in time along random path S_current <- omega_n[t] * f(S_current, chi[hat["x_nt"]], p) } }
for comparison: the SDP solution
opt <- find_dp_optim(sdp_matrix, S, chi, 70, 0.5, C, 1-gamma, reward=0) opt$V
Note that after the first iteration, $n=1$, the value matrix $V$ is no longer all zeros. There is a single state, $S = S_0 =$ r S_0
, at which we have value. That value is lost if we set harvest $x$ too high, since we know we will not then end up in that state -- from whence comes the incentive to consider future value. Unfortunately, the value exists only if we hit that state exactly -- all other states are assumed to have zero value still.
We no longer have the loop-over-all-states problem, but we face several new or remaining issues:
We still require the use of the one-step transition matrix, with the equally troublesome sum over all states $\sum_{s^{\prime}\in S} \mathbb{P}(s^{\prime} | S_t^n, x_t)$. We will fix this by approximating the transitions in step 2b using random draws as well.
We only update the values of states we visit. We still need a way to estimate the value of states we have not visited.
Worse, we might not visit states that seem bad relative to states we have visited. This is particularly atrocious in this example. Since we initialize the value of all states at 0, the algorithm prefers to harvest all stock from the current state rather than risk a transition into a state starting at 0. There is no convergence guarentee that we will ever escape this cycle of avoiding states we have not seen. We can alter the initial guess of the value of course, and we could alter the starting condition to better explore.
Taking $x_0$ as the smallest harvest, $\min(\chi)$ = r min(chi)
and
evaluating $C(S_0,X_0) = \min(S_0, X_0)$ gives us r min(S_0, min(chi))
,
rather trivially.
The next terms depend on the value $\tilde V^0_1(s^{\prime})$ for
all $s^{\prime} \in S$, which we have no idea about. Fortunately we have
assumed a value for each of these in step 0a.
We must also come up with some values for the probability $\mathbb{P}(s^{\prime} | S_1^0, x_1)$ for each state, given our current state $S_1^0$ and considered action $x_1$. This is more straight forward, since it is determined by our one-step transition function (without simulation - recall that the single step transition is given exactly).
To do so, we evaluate the argument for each value in our action space, $x_t \in \chi_t$,
s <- S_0 C <- function(S, X) pmin(S, X) arg <- sapply(chi, function(x) C(s, x) + f(S, x, p) %*% V) x_nt = which.max(arg) v_nt = max(arg) V[x_nt] = v_nt
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.