knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

knitr::opts_chunk$set(echo = TRUE, fig.width = 7, fig.height = 7)
par(mar=c(1,1,1,1))

Introduction

This example will use the Deshka River Chinook salmon brood table to demonstrate preseason salmon forecasts in R. The process relies on cross validation for model selection and offers a flexible framework where several models can be considered easily. The functions used in this tutorial are all part of the "preseason" package available on GitHub and can be installed with using the command devtools::install_github("ADFG-DSF/preseason").

library(preseason)

Currently this brood table is generated from 2 sources; 1) posterior estimates of escapement, recruitment and age-at-maturity in brood years 1979-2017 from the Susitna River Chinook salmon run reconstruction, and 2) 2018 Deshka River Chinook escapement and age composition estimates. In the future it would be ideal if the run reconstruction model were run postseason before generating forecasts for the following year allowing posterior estimates to be used alone. Take a look by typing.

deshka[deshka$byr %in% 1979:2018, ]

This example will demonstrate an example workflow for age-5 Chinook salmon. The first step is to prepare the brood table for analysis using the function "prep_brood". In the output the column "lnRS" is relative the oldest age included in the table and years without complete broods relative to the oldest age in the table are removed.

dat5 <- prep_brood(deshka, age_range = 3:5)
tail(dat5)

We will work with the logs of abundance (suffix "_ln") because normal distributions that fit these data include negative run sizes.

plot_dist(deshka$age5)

Sibling models

A sibling model estimates the run size of an age class using the run sizes of younger siblings as predictors. For age-5 Chinook we can use age-3 and age-4 fish as predictors. Here is a look at the data, the relationship with age-4 fish is pretty obvious although the relationship with age-3 fish is not.

ggplot2::ggplot(dat5, ggplot2::aes(x = age4_ln, y = age5_ln, size = age3_ln)) + ggplot2::geom_point()

This model is significant with decent looking residuals. At this point a good practice is to look for correlation in the residuals, although in this case I found no improvement in model fit by including an ARIMA component.

sib5 <- lm(age5_ln ~ age4_ln + age3_ln, data = dat5)
summary(sib5)
par(mfrow = c(2,2)); plot(sib5); par(mfrow = c(1,1))

The function pred_lm produces predictions for each year with the data from every other year using a linear model. We can save those to the original dataframe with the name sib_pred where the suffix (_pred) is important becasue a later function will look for variables with that suffix.

temp <- pred_lm(sib5)
dat5$sib_pred <- exp(temp[1,] + temp[2,]^2/2)

Ricker models

The age 5 Ricker model fits the data and has decent residuals but also demonstrates some residual correlation.

plot(dat5$S, dat5$lnRS)
rick5 <- lm(lnRS ~ S, data = dat5)
summary(rick5)
par(mfrow = c(2,2)); plot(rick5); par(mfrow = c(1,1))
forecast::tsdisplay(residuals(rick5))

The best fitting time series model is a AR1 model. The function "pred_arima" produces predictions for each year using an arima model and only the data prior to the predicted year. Let’s run that and save those predictions as well

forecast::auto.arima(dat5$lnRS, xreg = dat5$S)
rick5_ar1 <- arima(dat5$lnRS, order=c(1,0,0), xreg = dat5$S, method="ML")
AIC(rick5, rick5_ar1)
dat5$ricker_pred <- exp(pred_arima(rick5_ar1, x = dat5$lnRS, xreg = dat5$S)[1,]) * dat5$S

Mean with residual correlation

Anton called these univariate models. There may be some residual correlation in the form of a AR1 process. We can use the pred_arima function to get these predictions and save them to our age 5 dataset.

forecast::tsdisplay(dat5$age5_ln)
forecast::auto.arima(dat5$age5_ln)
mu5_ar1 <- arima(dat5$age5_ln, order=c(1,0,0))
temp <- pred_arima(mu5_ar1, x = dat5$age5_ln)
dat5$mu_pred <- exp(temp[1,] + temp[2,]^2/2)

Moving average models

This function gives the moving average for a specified number of years. I'll do 3 and 5 year moving averages for illustration purposes. Don't worry about the predictions right now, we’ll see them later.

dat5$ma3_pred <- pred_ma(dat5$age5_ln, yrs = 3)
dat5$ma5_pred <- pred_ma(dat5$age5_ln, yrs = 5)

Exponential smoothing

Instead of trying to figure out a fixed number of years to include in our moving average we can use a method called exponential smoothing where you take a weighted average of the previous observation and the previous prediction. Since the previous prediction is also a weighted average (of the observation and prediction one year prior) you end up with a moving average where the weight given to recent data is optimized while fitting the model.

forecast::ets(dat5$age5_ln, "ANN")
dat5$es_pred <- pred_es(dat5$age5_ln)

Model Comparison

The function "comp_mods" provides a figure and two statistics to help you decide on the best model. Use md (mean deviation) to assess bias (closest to zero is best) and mad to assess accuracy (smaller is better). These statistics are calculated based on the last 5 years of predictions by default. In general, you want the smallest mad you can get unless the md is very similar to the mad. In this case sib_pred has the smallest mad and also the smallest md, so it is a pretty clear choice.

comp_models(dat5, 5)

2019 forecast

As of now we have to calculate the future years prediction directly. In this case we are using a linear model with no autocorrelation, the x variables are the natural log of the number of age 3 and age 4 fish in the brood year. We can retrieve these from the brood table.

tail(deshka, 10)
temp <- predict(sib5, data.frame(age4_ln = log(2146), age3_ln = log(874)), se.fit = TRUE)
exp(temp$fit + temp$se.fit^2/2)


adamreimer/preseason documentation built on Dec. 21, 2024, 10:15 p.m.