In merrillrudd/LIME: Length-based Integrated Mixed Effects (LIME) assessment method

knitr::opts_chunk$set(fig.align="center", external=TRUE, echo=TRUE, warning=FALSE, fig.pos="H")

Introduction

This package contains functions to run the Length-based Integrated Mixed Effects (LIME) fisheries stock assessment method. The LIME package can be used in two ways:

as an age-structured operating model to generate simulated length composition, catch data, and abundance indices based on single or multiple fleets and seasons, and
as an estimation model by fitting to length composition data (at a minimum) to estimate annual fishing mortality, annual recruitment, relative spawning biomass, spawning potential ratio (SPR), and other derived outputs of an age-structured model.

Length-based

LIME was developed for data-limited fisheries, where few data are available other than a representative sample of the size structure of the vulnerable portion of the population (i.e., length composition data from the catch) and an understanding of the life history of the species.

Integrated

While length composition from the catch can hold information on fishing mortality, gear selectivity, and recruitment, with process and observation error it can be very difficult to tease apart the patterns and biological mechanisms from the noise. With additional information, such as even a few years of catch trends or an abundance index, analysts can include additional information to the LIME model to help tease apart these processes. For example, catch data is much more informative of the fishing mortality rate over time than the length composition data. Catch used alongside length composition data can help decrease bias and uncertainty in estimates of fishing mortality rates.

LIME also includes all years of data in the same analysis with a penalty on the annual fishing mortality rate to keep it from increasing or decreasing unrealistically between years. However, LIME assumes the selectivity curve is constant over time for each fleet.

Mixed effects

LIME relaxes the equilibrium assumptions of other length-based methods by estimating annual fishing mortality and recruitment variation (among other parameters), deriving annual recruitment as a random effect.

See Rudd and Thorson (2017) in the reference list for full details of the model, including simulation testing to evaluate performance across life history types, population variability scenarios, and data availability scenarios, as well as violations of the model assumptions.

Bug Reports

Please alert me to any bugs or issues by using GitHub.

Comments and suggestions for additional features are welcome and we can discuss via email at mbrudd@uw.edu.

Installation

Installing the Package

devtools: You can install the development version of the package from GitHub using the devtools package:

install.packages("devtools", repos='http://cran.us.r-project.org')

Follow the steps to install TMB:
Windows: https://github.com/kaskr/adcomp/wiki/Windows-installation
Linux: https://github.com/kaskr/adcomp (See README)
Mac: should be the same as Linux, please see contact info under Bug Reports if that doesn't work.
Install required package TMBhelper:

devtools::install_github("kaskr/TMB_contrib_R/TMBhelper")

Install LIME:

devtools::install_github("merrillrudd/LIME", dependencies=TRUE)

The installation may produce error messages requiring other packages. You can download those from CRAN if they are not automatically downloaded in the LIME installation process, and then start from the step that produced the error. Again, please use the contact info in Bug Reports if this doesn't work, as you may have discovered a new installation error... congratulations!

Load packages

library(LIME)
library(tidyverse)

Step 1: Specify biological inputs and starting values

The LIME package simulation and estimation features require the biological inputs (e.g. growth, natural mortality, maturity) and starting values for selectivity parameters in a list. Using create_lh_list will make sure all required elements are included within the list and in the required format. The create_lh_list function requires inputs for some parameters and includes default values for others.

Minimum inputs for `create_lh_list`

The user is required to input the following parameters to create_lh_list for simulation and estimation:

Minimum inputs: Biology

linf: von Bertalanffy asymptotic length
vbk: von Bertalanffy growth coefficient
M: Annual natural mortality rate
lwa: Length-weight scaling parameter
lwb: Length-weight allometric parameter
M50 Length or age at 50% maturity
maturity_input: Whether M50 input is in "length" or "age"

Minimum inputs: Exploitation

S50: Length or age at 50% selectivity
S95: Length or age at 95% selectivity
selex_input: Whether S50 and S95 are in "length" or "age"

Other inputs for `create_lh_list` with default settings

Other inputs: Biology

M95: Length or age at 95% maturity; default=NULL for one-parameter logistic maturity. M95 will use two-parameter logistic maturity curve. It is assumed M95 should be input as length if M50 is length, same for age.
t0: Length at age=0; Default=-0.01, should be adjusted based on local studies or meta-analysis.
R0: Equilibrium recruitment; Default=1.0. Changing R0 will change the scale of the population size. It is recommended to set an initial value of R0 more appropriate to the expected population size if using catch data to estimate population size with LIME.
h: Steepness; Default=1.0 such that the expected recruitment calculated in the Beverton-Holt stock-recruit curve is not affected by the level of spawning biomass. Setting h below 1.0 will lead to the expected annual recruitment as a function of the spawning biomass. LIME will still calculate recruitment deviates around this expected value of annual recruitment.
AgeMax: Maximum age; Default=-log(0.01)/M, the age at which 1% of the population is still alive based on the specified natural mortality rate M. This age can be adjusted based on local evidence of maximum age.
rho: Recruitment autocorrelation, used in simulation only; Default=0 (no recruitment autocorrelation). Changing the value of rho will add autocorrelation to the generated recruitment time series for a simulation study.

Other inputs: Exploitation

S95: Length or age at 95% selectivity; Default=NULL for one-parameter logistic selectivity; specifying S95 will use two-parameter logistic selectivity curve. It is assumed S95 should be input as length if S50 is length, same for age.
selex_type: Selectivity function type; Default=logistic using a one-parameter logistic selectivity curve if S95 is not specified, or a two-parameter logistic curve if S95 is specified. Alternate option=dome to generate a population exploited based on a dome-shaped selectivity curve. Dome-shaped selectivity cannot currently be estimated in LIME, but specifying dome-shaped selectivity in the create_lh_list function can generate an assumed dome-shaped curve to be fixed when using LIME for estimation.
dome_sd: Dome-shaped selectivity right-hand standard deviation; Default=NULL for logistic selectivity, must be specified if selex_type="dome". The standard deviation of the normal distribution for the right side of the selectivity curve (after the logistic curve specified reaches 100% selectivity).
Fequil: The proportion of spawning biomass relative to unfished spawning biomass at bioeconomic equilibrium, default=0.5.
Frate: exponent determining the strength of coupling between effort and changes in biomass, default=0.2.
qcoef: Catchability coefficient; default=1e-5. Estimated when using an abundance index, starting value should be adjusted to a reasonable number based on the abundance index and starting value for the scaling parameter R0.
theta: Dirichlet-multinomial parameter; Default=1, a relative small value. With a larger theta (e.g. 10) there will be no variance inflation where the effective sample size will approach the input sample size (Thorson et al. 2017)
nseasons: Number of seasons in a year; Default=1 for annual length composition data and annual growth and mortality parameters. For short-lived species, it is helpful to use a shorter-than-annual time step to account for the growth of short-lived fish during one year, if length data is collected on time steps less than one year (e.g. month, nseasons=12).
nfleets: Number of fleets to be modeled in the fishery; default=1 for a single fleet. With nfleets greater than 1, the number of S50, S95, and selex_type values must match the value for nfleets. Example, S50=c(20,26), S95=c(25,35), selex_type=rep('logistic', nfleets)

Other inputs: Variation

CVlen: Coefficient of variation around the growth curve; Default=0.1; Should be adjusted based on the expected variability in the age-length curve.
SigmaR: Recruitment standard deviation; Default=0.737, the median across all fish species (Thorson et al. 2014); Used for generating recruitment deviates in the simulation or as a starting value for its estimation using LIME.
SigmaF: Fishing mortality standard deviation; Default=0.2; Used for generating fishing mortality deviates in the simulation or as the standard deviation of the fishing mortality penalty when estimating annual fishing mortality using LIME.
SigmaC: Catch standard deviation; Default=0.2; Used as the fixed standard deviation in the lognormal likelihood when fitting LIME to catch data.
SigmaI: Index standard deviation; Default=0.2; Used as the fixed standard deviation in the lognormal likelihood when fitting LIME to index data.

Now let's populate the create_lh_list function with example values.

lh <- create_lh_list(vbk=0.21, 
           linf=65, 
           t0=-0.01,
           lwa=0.0245, 
           lwb=2.79, 
           S50=c(20), 
           S95=c(26), 
           selex_input="length",
           selex_type=c("logistic"),
           M50=34,
           maturity_input="length",
           M=0.27, 
           binwidth=1,
           CVlen=0.1,
           SigmaR=0.737,
           SigmaF=0.2,
           SigmaC=0.1,
           SigmaI=0.1,
           R0=1,
           Frate=0.1,
           Fequil=0.25,
           qcoef=1e-5,
           start_ages=0,
           rho=0.43,
           theta=10,
           nseasons=1,
           nfleets=1)

Plot the biological and selectivity inputs

Now we should check out the biological parameters and selectivity we've created. Note that even though we input maturity and selectivity by length, create_lh_list converts to age and outputs both selectivity-at-age by fleet (lh$S_fa) and selectivity-at-length by fleet (lh$S_fl).

par(mfrow=c(2,2), mar=c(4,4,3,1))
plot(lh$L_a, type="l", lwd=4, xlab="Age", ylab="Length (cm)")
plot(lh$W_a, type="l", lwd=4, xlab="Age", ylab="Weight (g)")
plot(lh$Mat_l, type="l", lwd=4, xlab="Length (cm)", ylab="Proportion mature")

# plot selectivity for the first (and only) fleet (first row)
plot(lh$S_fl[1,], type="l", lwd=4, xlab="Length (cm)", ylab="Proportion selected to gear")

Check time step regarding predicted growth

Particularly for short-lived fish, it is possible that an annual time step is too coarse of a time scale to capture individual growth between years. For example, if a fish grows rapidly between ages 1 and 2, it is possible the probability of being a length given age will result in a negligible probability of being certain lengths. However, it is likely those lengths will be observed, and in this case LIME will not be able to fit the data well.

To check for this issue, we recommend plotting the probability of being in a length bin given age to make sure there is greater than negligible probability of observing all lengths. In this case, there is a small overlap between the distributions for age 1 and 2 fish, which should be enough to have an adequate model fit if fish are observed at approximately 17 cm.

For tips on what to do when there is a negligible probability of observing some lengths, see the "Format data" section in this document.

plba <- with(lh, age_length(highs, lows, L_a, CVlen))

ramp <- colorRamp(c("purple4", "darkorange"))
col_vec <- rgb( ramp(seq(0, 1, length = nrow(plba))), max = 255)
matplot(t(plba[-1,]), type="l", lty=1, lwd=3, col=col_vec, xaxs="i", yaxs="i", ylim=c(0, 0.5), xlab="Length bin (cm)", ylab="Density")
legend("topright", legend=lh$ages[seq(2,length(lh$ages),by=3)], col=col_vec[seq(2,length(lh$ages),by=3)],, lwd=3, title="Age")

Step 2: Data inputs to LIME

The first subsection on data formatting provides an overview of the simulation functions within the LIME package. To learn how to format data from your fishery, skip to the "Format data" section.

Simulating data

Using the life history list output from create_lh_list, we can simulate a population and generate data.

The generate_data function has several settings for which the user is required to input a value:

modpath: model path for saving simulated populations; can set as NULL to run within the R environment only without saving locally.
itervec: vector of iterations of simulated data; can be 1 to run only one iteration, or 1:100 to run 100 iterations of simulated populations.
lh: life history list, output from create_lh_list.
Fdynamics: Pattern for fishing mortality dynamics; options include Constant, Oneway, Endogenous, or None.
Rdynamics: Pattern for recruitment dynamics; options include Constant, or Pulsed.
Nyears: number of years for your simulated population.
Nyears_comp: number of years to generate length data; e.g. if 10 years and Nyears is 20 years, will be the last 10 years in the 20 year time series. Must correspond to the number of fleets, e.g. c(20,10) would mean 20 years of length data for fleet 1 and 10 years of length data for fleet 2.
comp_sample: nominal sample size of length data for each year; e.g. vector of length 20 (1 for each year), with 200 length measurements collected annually. If there is more than 1 fleet and only 1 number is specified, that number will apply to all fleets (e.g. 200 samples in each fleet).
init_depl: initial depletion, or the proportion of the unfished population biomass in the first year of the population to be modeled. Specifying a single value indicates the initial depletion (e.g. 0.5) or two values indicates the lower and upper bounds of a uniform distribution for which the population simulation will randomly draw a depletion value (e.g. c(0.1,0.9)).
seed: set a seed for random numbers for generating process deviations.

There are also several other settings not required but may be useful:

rewrite: default=TRUE to always re-simulated data. rewrite=FALSE may be useful to only simulate a new population if the True.rds output file already exists in the folder.
pool: default=TRUE, meaning that if the number of seasons per year as specified in the create_lh_list function is more than 1, the length composition data collected in each time set is pooled together annually. pool=FALSE would mean the generated length composition data is on a time step shorter than 1 year (e.g. monthly if nseasons=12). The total sample size for the year will still be equal to comp_sample.

Now let's use generate_data to simulate 1 example population (itervec=1) to generate length, catch, and an abundance index. We won't specify a modpath so the simulated population will just be used locally in this vignette.

true <- generate_data(modpath=NULL,
            itervec=1, 
            Fdynamics=c("Endogenous"),
            Rdynamics="Constant",
            lh=lh,
            Nyears=20,
            Nyears_comp=c(20),
            comp_sample=rep(200,20),
            init_depl=0.7,
            seed=123,
            fleet_proportions=1)

par(mfrow=c(3,2))
plot(true$F_ft[1,], type="l", lwd=4, xlab="Year", ylab="Fishing mortality", ylim=c(0,max(true$F_ft)*1.1), xaxs="i", yaxs="i", cex.axis=1.5, cex.lab=1.5)
plot(true$R_t, type="l", lwd=4, xlab="Year", ylab="Recruitment", ylim=c(0,max(true$R_t)*1.1), xaxs="i", yaxs="i", cex.axis=1.5, cex.lab=1.5)
plot(true$SPR_t, type="l", lwd=4, xlab="Year", ylab="SPR", ylim=c(0,1), xaxs="i", yaxs="i", cex.axis=1.5, cex.lab=1.5)
plot(true$D_t, type="l", lwd=4, xlab="Year", ylab="Relative spawning biomass", ylim=c(0,max(true$D_t)*1.1), xaxs="i", yaxs="i", cex.axis=1.5, cex.lab=1.5)
plot(true$Cw_ft[1,], type="l", lwd=4, xlab="Year", ylab="Catch", ylim=c(0,max(true$Cw_ft)*1.1), xaxs="i", yaxs="i", cex.axis=1.5, cex.lab=1.5)
plot(true$I_ft[1,], type="l", lwd=4, xlab="Year", ylab="Abundance index", ylim=c(0,max(true$I_ft)*1.1), xaxs="i", yaxs="i", cex.axis=1.5, cex.lab=1.5)

The LIME package includes the plot_LCfits function which can first be used to plot the length frequency data, and later used to plot the model fits to the length frequency data. The plot_LCfits function requires the length frequency data in data frame form, which can be formatted correctly using the LFreq_df function.

LF_df <- LFreq_df(true$LF)

The length frequency data frame can then be used in the plot_LCfits function.

plot_LCfits(LF_df=LF_df)

Format data

LIME requires data input in a specific format:

Length composition data

Option 1: Matrix

With only one fleet, length composition data can be in a length frequency matrix, with years along the rows and length bins along the columns. Here are 5 years as an example:

LF_mat <- true$LF[,,1]
LF_mat[1:5,]

Option 2: Array

The length composition data can be an array of matrices (one for each fleet). Columns should be named with the upper end of the length bin and rows should be named with the year.

First five years:

LF_array <- true$LF
LF_array_sub <- array(LF_array[1:5,,1], dim=c(5, ncol(LF_array[,,1]), 1))
colnames(LF_array_sub) <- colnames(LF_mat)
rownames(LF_array_sub) <- rownames(LF_mat)[1:5]
LF_array_sub

Option 3: List

The length composition data could also be a list of matrices (one for each fleet).

One fleet, first five years:

LF_list <- list()
LF_list[[1]] <- LF_array_sub[,,1]
LF_list

Catch and abundance index

Time series data (catch and abundance index) should me in matrix form with fleets along the rows and years along the columns. Rows and columns do not need to be explicitly named with fleet number or year, but the number of years should match the total number of years modeled. If data are not available in any given year there should be a negative place-holder.

With one fleet, the catch or abundance index time series should still be in matrix form rather than a vector (the model will be looking for the first row, for the one and only fleet).

Catch, one fleet:

true$Cw_ft

Abundance index, one fleet:

true$I_ft

Data input

Observations

LIME requires inputing observed data in a list form, including:

years: inclusive years to be modeled
LF: length frequency data in matrix, array, or list form
neff_ft: effective sample size to use in case of multinomial distribution (otherwise use Dirichlet-multinomial to estimate effective sample size, or assume nominal sample size = effective sample size if not included)
I_ft: Index of abundance by fleet and time (if applicable)
C_ft: Catch by fleet and time (if applicable) in numbers or biomass (to be specified later)

data_all <- list("years"=1:true$Nyears, "LF"=true$LF, "I_ft"=true$I_ft, "C_ft"=true$Cw_ft, "neff_ft"=true$obs_per_year)

Life history and starting values

LIME includes a function create_inputs which checks the length frequency data and includes all input data, life history information, and starting values in a list together to input into LIME.

create_inputs requires:

lh: life history and starting values list output from create_lh_inputs
input_data: input data list described in Data input section above.

inputs_all <- create_inputs(lh=lh, input_data=data_all)

Step 3: Run LIME

Estimated parameters

LIME is set up to estimate certain parameters and fix others by default.

LIME estimates by default:

log_F_ft: matrix of fishing mortality estimates in log-space by fleet (rows) over time (columns)
log_sigma_R: recruitment standard deviation in log-space
log_S50_f: length at 50% selectivity in log space, one for each fleet
log_Sdelta_f: difference between length at 95% selectivity and length at 50% selectivity in log-space, so that length at 95% selectivity can never be less than length at 50% selectivity, one for each fleet
Nu_input: temporal variation in recruitment, treated as random effect

Parameters estimated by default under certain conditions:

log_theta: Dirichlet-multinomial parameter relating to effective sample size (only estimated if using Dirichlet-multinomial to fit to length composition data, where run_LIME argument LFdist=1)
beta: equilibrium recruitment in log-space, serves as population scaling parameter (only estimated if fitting to catch data, run_LIME argument data_avail must include "Catch")
log_q_f: catchability coefficient in log-space, one for each fleet (only estimated if fitting to an index, run_LIME argument data_avail must include "Index")

Other parameters that could be estimated but are fixed by default:

log_sigma_C: observation error on catch, fixed at log(0.2)
log_sigma_I: observation error on abundance index, fixed at log(0.2)
log_CV_L: coefficient of variation on predicted age-length curve, fixed at log(0.1)

`run_LIME` function

The function run_LIME used to run LIME models has many settings, but hopefully most models will keep the defaults and use only a few arguments.

Basic inputs to run_LIME:

modpath: model path to save results and flags; default=NULL to save in R environment locally
input: list output from create_inputs
data_avail: what data types will LIME fit to? "LC"= length composition only, "Index_LC"= index and length comps, "Catch_LC"= catch and length comps, and "Index_Catch_LC" = index, catch, and length comps.

If fitting to catch data, the user must specify:

C_type: default=0 (no catch data), 1 for catch in numbers, 2 for catch in biomass

Additional settings in run_LIME:

LFdist: which distribution to fit to length frequency data? default=1 to use Dirichlet-multinomial and estimate additional parameter related to effective sample size, multinomial = 0
est_more: additional parameters to estimate that are fixed by default (see Estimated parameters section)
fix_more: additional parameters to fix that are estimated by default (see Estimated parameters section)
est_F_ft: which F parameters to estimate? default=TRUE to estimate all. Otherwise, a matrix with fleets as rows and years along columns, with a 1 where the F parameter should be estimated and a 0 where the F parameter should be fixed at the starting value.
f_startval_ft: starting values for fishing mortality, default=NULL to start at 0.2 for all years and fleets. Otherwise, matrix of starting values for F where fleets are along the rows and years along the columns (e.g. can input the estimated F values from the previous run with Report$F_ft).
rdev_startval_t: starting values for recruitment deviations over time. default=NULL to start at 0 for all years. Otherwise, specify vector of recruitment deviations for all years to be modeled (e.g. can start at simulated truth and turn off estimation for debugging)
est_selex_f: estimate selectivity parameters? default=TRUE to estimate selectivity parameters for all fleets. Turn off selectivity estimation for all or a single fleet with FALSE (e.g. estimate selectivity parameters for fleet 1 but not fleet 2 with c(TRUE, FALSE))
vals_selex_ft: input selectivity-at-length (columns) by fleet (rows). default= matrix of negative values to estimate selectivity. Otherwise, input selectivity and set est_selex_f=FALSE. This is how one would go about fixing the selectivity curve to a dome shape.
randomR: default=TRUE treats recruitment deviations as a random effect. FALSE turns the random effect off.
newtonsteps: number of extra Newton steps to take after optimization to help convergence, default=3; FALSE to turn off. Note: it seems with TMB that bounding parameters does not work when using newtonsteps. If a parameter is estimated outside of a reasonable bound, I recommend setting newtonsteps=FALSE.
F_up: upper bound on fishing mortality estimate
S50_up: upper bound on length at 50% selectivity
Fpen: penalty on fishing mortality to avoid fishing mortality being estimated at drastically different values between years; default=1 on, otherwise 0=off.
SigRpen: penalty on sigmaR, default=1 on, otherwise off=0.
SigRprior: vector with prior info for normally distributed sigmaR penalty, first term is the mean and second term is the standard deviation; default =c(0.737, 0.3)
derive_quants: if TRUE, derive MSYbased reference points, default=FALSE
est_totalF: TRUE will estimate total fishing mortality across all fleets (number of parameters = number of years), not by fleet (where number of parameters would = number of years * number of fleets). default=FALSE estimates F by year and fleet.
prop_f: proportion of catch from each fleet, must sum to 1.0. When est_totalF=TRUE, must assume how much of the catch comes from each fleet (e.g. c(0.5,0.5), c(0.6,0.4), etc.)
mirror: vector of parameter names matching those in Estimated parameters section to mirror between fleets (estimating one parameter instead of the same number of fleets)
rewrite: default=TRUE to rewrite LIME output (LIME_output.rds) in the directory modpath. FALSE is useful for simulation testing if the process was stopped midway through 100 iterations, so a function can skip iterations with output.

Settings in run_LIME associated with simulation-testing:

simulation: default=FALSE, but if TRUE, sets working directory in the iteration directory, not the general modpath
itervec: vector of iterations for simulation, e.g. 1:100. default=NULL for stock assessment application with real data (1 iteration)

Example: Data-rich model run

Here's the best-case scenario for running LIME: 20 years of length composition, catch, and an abundance index, with a typical 200 samples of length measurements per year, assuming the life history information is known without error.

Here we specify C_type=2 because the catch data specified in data_all in the Data input section was catch in weight, not numbers.

start <- Sys.time()

rich <- run_LIME(modpath=NULL, 
                  input=inputs_all,
                  data_avail="Index_Catch_LC",
                  C_type=2)

end <- Sys.time() - start
end

The time difference above shows the run time of the data-rich case on the machine that compiled this document. The run time will vary depending on the machine, but hopefully this gives a ballpark of how long it takes to run a model with approximately 20 years and three data types.

The object rich contains several elements of LIME output:

Inputs: List of Data, Parameters, Map, and Random elements that are input directly to the LIME.cpp program. The goal is to help the user check the exact values and structure that went in to the LIME optimization using TMB.
Report: Report file of maximum likelihood estimates and derived values of population parameters from the LIME model run.
Sdreport: Standard errors of all estimated parameters and derived values of population parameters which are specified in the LIME.cpp file.
obj: the TMB object
opt: the optimization results from TMBhelper::Optimize
df: a data frame of final gradients for each estimated parameter
input: input data list created using create_inputs function. This is used when re-running the model using the function get_converged.
data_avail: the data types available. This is used when re-running the model using the function get_converged.

Check convergence

General convergence criteria for LIME models includes:

Final gradient of all estimated parameters is within +/- 0.001
Hessian matrix is positive definite

We can check this for any model run using the following code:

gradient <- rich$opt$max_gradient <= 0.001
hessian <- rich$Sdreport$pdHess
hessian==TRUE & gradient == TRUE

Because both the hessian and gradient checks passed, we assume that the program found the global minimum of the negative log likelihood surface and can move on to examine results.

Plot results

The LIME package includes functions to plot length composition model fits and other results.

LIME model fits to length frequency data:

plot_LCfits(Inputs=rich$Inputs,
            Report=rich$Report)

LIME estimates of key population parameters:

plot_output(Inputs=rich$Inputs,
            Report=rich$Report,
            Sdreport=rich$Sdreport,
            lh=lh,
            True=true,
            plot=c("Fish","Rec","SPR","ML","SB","Selex"),
            set_ylim=list("Fish"=c(0,1),"SPR"=c(0,1)))

Example: Length data only

This is the scenario for which LIME was developed: integrating multiple years of length data to account for time-varying fishing mortality and recruitment. The big change from the data-rich case in the code is to make sure data_avail="LC" and C_type is set to zero or removed from the function call (because the default is zero). We can still use the inputs_all list that includes the index and catch data, but as long as data_avail="LC" the model will not include the other data types.

start <- Sys.time()

lc_only <- run_LIME(modpath=NULL, 
                  input=inputs_all,
                  data_avail="LC")

end <- Sys.time() - start
end

Note the time difference is shorter than the data-rich case.

Check convergence

gradient <- lc_only$opt$max_gradient <= 0.001
hessian <- lc_only$Sdreport$pdHess
hessian==TRUE & gradient == TRUE

Plot results

We can use the same LF_df as was calculated in the data-rich case, but compare the LIME model fits from the model that excluded catch and abundance index:

plot_LCfits(LF_df=LF_df,
            Inputs=lc_only$Inputs,
            Report=lc_only$Report)

Model fits with length data only are of course much less accurate and with higher uncertainty than the data-rich case:

plot_output(Inputs=lc_only$Inputs,
            Report=lc_only$Report,
            Sdreport=lc_only$Sdreport,
            lh=lh,
            True=true,
            plot=c("Fish","Rec","SPR","ML","SB","Selex"),
            set_ylim=list("Fish"=c(0,0.5),"SPR"=c(0,1)))

The model meets convergence criteria but the fishing mortality rate estimates jump around between years more than is realistic. To attempt to curb this behavior, it is possible to decrease the standard deviation for fishing mortality which will serve to penalize estimates of fishing mortality that stray too far from the previous year.

inputs_all$SigmaF <- 0.1
lc_only2 <- run_LIME(modpath=NULL, 
                  input=inputs_all,
                  data_avail="LC")

gradient <- lc_only2$opt$max_gradient <= 0.001
hessian <- lc_only2$Sdreport$pdHess
hessian==TRUE & gradient == TRUE

plot_output(Inputs=lc_only2$Inputs,
            Report=lc_only2$Report,
            Sdreport=lc_only2$Sdreport,
            lh=lh,
            True=true,
            plot=c("Fish","Rec","SPR","ML","SB","Selex"),
            set_ylim=list("Fish"=c(0,0.5),"SPR"=c(0,1)))

Example: Length and catch data

This is the scenario most similar to the ideas behind catch-curve stock reduction analysis (CCSRA). CCSRA used catch and age composition data to estimate stock status. LIME can use catch and length composition, which is much easier to collect than ages. With accurate assumptions about life history information, the combination of catch and length data can be very informative of stock status.

start <- Sys.time()

catch_lc <- run_LIME(modpath=NULL, 
                  input=inputs_all,
                  data_avail="Catch_LC",
                  C_type=2)

end <- Sys.time() - start
end

Check convergence

gradient <- catch_lc$opt$max_gradient <= 0.001
hessian <- catch_lc$Sdreport$pdHess
hessian==TRUE & gradient == TRUE

Plot results

Model fits including catch data with length composition data:

plot_LCfits(LF_df=LF_df,
            Inputs=catch_lc$Inputs,
            Report=catch_lc$Report)

Including catch data improves estimates of population parameters over length data only.

plot_output(Inputs=catch_lc$Inputs,
            Report=catch_lc$Report,
            Sdreport=catch_lc$Sdreport,
            lh=lh,
            True=true,
            plot=c("Fish","Rec","SPR","ML","SB","Selex"),
            set_ylim=list("Fish"=c(0,0.5),"SPR"=c(0,1)))

Turning off parameter estimation

There may be non-convergence issues (apparent from either a high final gradient or non-positive definite Hessian matrix). Some first checks in these cases are to 1) see if the Dirichlet-multinomial theta parameter is estimated too high, in which case it can be fixed to a high value and not estimated:

inputs_all$theta <- 50
check1 <- run_LIME(modpath=NULL,
                  input=inputs_all,
                  data_avail="LC",
                  fix_more="log_theta")

Another option is to check if the recruitment standard deviation is estimated too high or too low, in which case it can be fixed at a reasonable value:

inputs_all$SigmaR <- 0.3
check2 <- run_LIME(modpath=NULL,
                  input=inputs_all,
                  data_avail="LC",
                  fix_more="log_sigma_R")

Some length datasets may not be informative enough to tease apart fishing mortality, recruitment deviations, and selectivity. In these cases it may be convenient or helpful to fix selectivity at the starting values:

check3 <- run_LIME(modpath=NULL, 
                  input=inputs_all,
                  data_avail="LC",
                  est_selex_f=FALSE)

plot_output(Inputs=check3$Inputs,
            Report=check3$Report,
            Sdreport=check3$Sdreport,
            lh=lh,
            True=true,
            plot=c("Fish","Rec","SPR","ML","SB","Selex"),
            set_ylim=list("Fish"=c(0,0.5),"SPR"=c(0,1)))

It is also possible to fix selectivity in a shape that is not the 2-parameter logistic curve. For example, we can set up dome-shaped selectivity using the 2-parameter logistic function on the left side of the fully selected lengths, and a normal distribution on the right side of the fully selected lengths:

## create input list with dome-shaped selectivity
lh_dome <- create_lh_list(vbk=0.21, 
           linf=65, 
           t0=-0.01,
           lwa=0.0245, 
           lwb=2.79, 
           S50=c(20), 
           S95=c(26), 
           selex_input="length",
           selex_type=c("dome"),
           dome_sd=15,
           M50=34,
           maturity_input="length",
           M=0.27, 
           binwidth=1,
           CVlen=0.1,
           SigmaR=0.737,
           SigmaF=0.2,
           SigmaC=0.1,
           SigmaI=0.1,
           R0=1,
           Frate=0.1,
           Fequil=0.25,
           qcoef=1e-5,
           start_ages=0,
           rho=0.43,
           theta=10,
           nseasons=1,
           nfleets=1)

## generate data with dome-shaped selectivity
true_dome <- generate_data(modpath=NULL,
            itervec=1, 
            Fdynamics=c("Endogenous"),
            Rdynamics="Constant",
            lh=lh_dome,
            Nyears=20,
            Nyears_comp=20,
            comp_sample=rep(200,20),
            init_depl=0.7,
            seed=123)

## create data list
data_all <- list("years"=1:true_dome$Nyears, "LF"=true_dome$LF, "I_ft"=true_dome$I_ft, "C_ft"=true_dome$Cw_ft, "neff_ft"=true_dome$obs_per_year)

inputs_dome <- create_inputs(lh=lh_dome, input_data=data_all)

plot(lh_dome$S_fl[1,], type="l", lwd=4, xlab="Length (cm)", ylab="Proportion selected to gear")

run_dome <- run_LIME(modpath=NULL, 
                  input=inputs_dome,
                  data_avail="LC",
                  vals_selex_ft=inputs_dome$S_fl)

plot_output(Inputs=run_dome$Inputs,
            Report=run_dome$Report,
            Sdreport=run_dome$Sdreport,
            lh=lh,
            True=true,
            plot=c("Fish","Rec","SPR","ML","SB","Selex"),
            set_ylim=list("Fish"=c(0,0.5),"SPR"=c(0,1)))

Multiple fleets

With multiple fleets, all biological inputs are the same as with a single fleet. However, we must specify selectivity for each fleet. In this case, both selectivities are logistic.

lh_mf1 <- with(lh, create_lh_list(vbk=vbk, 
           linf=linf, 
           t0=t0,
           lwa=lwa, 
           lwb=lwb, 
           S50=c(20,30), 
           S95=c(26,36), 
           selex_input="length",
           selex_type=c("logistic","logistic"),
           M50=ML50,
           M95=NULL,
           maturity_input="length",
           M=M, 
           h=h,
           binwidth=binwidth,
           CVlen=CVlen,
           SigmaR=SigmaR,
           SigmaF=SigmaF,
           SigmaC=SigmaC,
           SigmaI=SigmaI,
           R0=R0,
           Frate=Frate,
           qcoef=qcoef,
           start_ages=0,
           rho=rho,
           nseasons=1,
           nfleets=2))

Using the simulation function for multiple fleets, we need to specify the fleet dynamics for each (Fdynamics), the number of years of length composition data for each (Nyears_comp), and the proportions of each fleet (fleet_proportions).

true_mf1 <- generate_data(modpath=NULL,
            itervec=1, 
            Fdynamics=c("Constant","Endogenous"),
            Rdynamics="Constant",
            lh=lh_mf1,
            Nyears=20,
            Nyears_comp=c(20,10),
            comp_sample=rep(200, 20),
            init_depl=0.7,
            seed=123,
            fleet_proportions=c(0.6,0.4))

library(RColorBrewer)
cols <- brewer.pal(3, "Set1")
plot(lh_mf1$S_fl[1,], type="l", lwd=2, xlab="Length", ylab="Selectivity", col=cols[1])
lines(lh_mf1$S_fl[2,], type="l", lwd=2, col=cols[2])

par(mfrow=c(3,2))
plot(true_mf1$F_y, type="l", lwd=4, xlab="Year", ylab="Fishing mortality", ylim=c(0,max(true_mf1$F_y)*1.1), xaxs="i", yaxs="i", cex.axis=1.5, cex.lab=1.5)
for(i in 1:lh_mf1$nfleets){
  lines(true_mf1$F_ft[i,], col=cols[i])
}
plot(true_mf1$R_t, type="l", lwd=4, xlab="Year", ylab="Recruitment", ylim=c(0,max(true_mf1$R_t)*1.1), xaxs="i", yaxs="i", cex.axis=1.5, cex.lab=1.5)
plot(true_mf1$SPR_t, type="l", lwd=4, xlab="Year", ylab="SPR", ylim=c(0,1), xaxs="i", yaxs="i", cex.axis=1.5, cex.lab=1.5)
plot(true_mf1$D_t, type="l", lwd=4, xlab="Year", ylab="Relative spawning biomass", ylim=c(0,max(c(1,max(true_mf1$D_t)))), xaxs="i", yaxs="i", cex.axis=1.5, cex.lab=1.5)
plot(colSums(true_mf1$Cw_ft), type="l", lwd=4, xlab="Year", ylab="Catch", ylim=c(0,max(colSums(true_mf1$Cw_ft))*1.1), xaxs="i", yaxs="i", cex.axis=1.5, cex.lab=1.5)
for(i in 1:lh_mf1$nfleets){
  lines(true_mf1$Cw_ft[i,], col=cols[i])
}
plot(true_mf1$I_ft[1,], type="l", xlab="Year", ylab="Abundance index", ylim=c(0,max(true_mf1$I_ft)*1.1), xaxs="i", yaxs="i", cex.axis=1.5, cex.lab=1.5, col=cols[1])
lines(true_mf1$I_ft[2,], col=cols[2])

Move the generated data from an array to a data frame to plot the data.

LF_list <- lapply(1:true_mf1$nfleets, function(x){
  return(true_mf1$LF[,,x])
})
LF_df <- LFreq_df(LF_list)

plot_LCfits(LF_df=LF_df)

data_mf1 <- list("years"=1:true_mf1$Nyears, "LF"=true_mf1$LF, "I_ft"=true_mf1$I_ft, "C_ft"=true_mf1$Cw_ft, "neff_ft"=true_mf1$obs_per_year)
inputs_mf1 <- create_inputs(lh=lh_mf1, input_data=data_mf1)

For now, use the multinomial distribution for multiple fleets, specifying LFdist=0.

rich_mf1 <- run_LIME(modpath=NULL, 
                  input=inputs_mf1,
                  data_avail="Index_Catch_LC",
                  C_type=2,
                  LFdist=0)

Check that the data-rich model converges:

gradient <- rich_mf1$opt$max_gradient <= 0.001
hessian <- rich_mf1$Sdreport$pdHess
hessian==TRUE & gradient == TRUE

plot_LCfits(Inputs=rich_mf1$Inputs,
            Report=rich_mf1$Report)

plot_output(Inputs=rich_mf1$Inputs,
            Report=rich_mf1$Report,
            Sdreport=rich_mf1$Sdreport,
            lh=lh_mf1,
            True=true_mf1,
            plot=c("Fish","Rec","SPR","ML","SB","Selex"),
            set_ylim=list("SPR"=c(0,1)))

With length data only, we need to estimate the total fishing mortality (as opposed to annual fishing mortality for each fleet). It's also useful to set the fishing mortality penalty to something relatively small to help the estimation.

inputs_mf1$SigmaF <- 0.1
lc_mf1 <- run_LIME(modpath=NULL, 
                  input=inputs_mf1,
                  data_avail="LC",
                  LFdist=0,
                  est_totalF=TRUE)

gradient <- lc_mf1$opt$max_gradient <= 0.001
hessian <- lc_mf1$Sdreport$pdHess
hessian==TRUE & gradient == TRUE

plot_LCfits(Inputs=lc_mf1$Inputs,
            Report=lc_mf1$Report)

plot_output(Inputs=lc_mf1$Inputs,
            Report=lc_mf1$Report,
            Sdreport=lc_mf1$Sdreport,
            lh=lh_mf1,
            True=true_mf1,
            plot=c("Fish","Rec","SPR","ML","SB","Selex"),
            set_ylim=list("SPR"=c(0,1)))

merrillrudd/LIME documentation built on June 20, 2020, 2:58 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

merrillrudd/LIME
Length-based Integrated Mixed Effects (LIME) assessment method

In merrillrudd/LIME: Length-based Integrated Mixed Effects (LIME) assessment method

Introduction

Length-based

Integrated

Mixed effects

Bug Reports

Installation

Installing the Package

Load packages

Step 1: Specify biological inputs and starting values

Minimum inputs for `create_lh_list`

Other inputs for `create_lh_list` with default settings

Plot the biological and selectivity inputs

Check time step regarding predicted growth

Step 2: Data inputs to LIME

Simulating data

Format data

Length composition data

Catch and abundance index

Data input

Step 3: Run LIME

Estimated parameters

`run_LIME` function

Example: Data-rich model run

Check convergence

Plot results

Example: Length data only

Check convergence

Plot results

Example: Length and catch data

Check convergence

Plot results

Turning off parameter estimation

Multiple fleets

R Package Documentation

Browse R Packages

We want your feedback!

merrillrudd/LIME Length-based Integrated Mixed Effects (LIME) assessment method

In merrillrudd/LIME: Length-based Integrated Mixed Effects (LIME) assessment method

Introduction

Length-based

Integrated

Mixed effects

Bug Reports

Installation

Installing the Package

Load packages

Step 1: Specify biological inputs and starting values

Minimum inputs for create_lh_list

Other inputs for create_lh_list with default settings

Plot the biological and selectivity inputs

Check time step regarding predicted growth

Step 2: Data inputs to LIME

Simulating data

Format data

Length composition data

Catch and abundance index

Data input

Step 3: Run LIME

Estimated parameters

run_LIME function

Example: Data-rich model run

Check convergence

Plot results

Example: Length data only

Check convergence

Plot results

Example: Length and catch data

Check convergence

Plot results

Turning off parameter estimation

Multiple fleets

R Package Documentation

Browse R Packages

We want your feedback!

merrillrudd/LIME
Length-based Integrated Mixed Effects (LIME) assessment method

Minimum inputs for `create_lh_list`

Other inputs for `create_lh_list` with default settings

`run_LIME` function