An introduction to ss3sim

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")


ss3sim is an R package to make it relatively quick and easy to run simulations with Stock Synthesis (SS). The package consists of

Much of this vignette is focused on how to effectively use run_ss3sim. If you use ss3sim in a publication, please cite the package as indicated by citation("ss3sim").


ss3sim can be run on OS X, Windows, or Linux. The CRAN version of ss3sim mirrors the master branch of the github repository that houses the code base. The development version contains the newest features that are not available in the CRAN version and can be installed from using devtools.

# CRAN version
# Github version, which depends on "devtools"
# install.packages("devtools")
  ref = "development", dependencies = TRUE, build_vignettes = TRUE)
# Load the package

Stock Synthesis (SS)

ss3sim was originally based on SS 3.24O. In 2019, ss3sim was updated to use SS 3.30, and future efforts will be dedicated to continuing to keep ss3sim up to date with SS. The most recent version of SS that was tested for use with ss3sim is available on github (e.g., see here for the Windows version) or can be obtained from the SS repository on Vlab. ss3sim uses the safe version of SS, which is slightly slower than the optimized version because it checks that vector, matrix, and array bounds are not exceeded. ss3sim requires that ss.exe be in your path. This means that your operating system knows where the binary file is located. See the vignette section on paths for details.

ss3sim also relies on the R package r4ss to read and write SS files. r4ss is available on CRAN but we exclusively rely on the github version. Feel free to install r4ss directly via devtools::install_github("r4ss/r4ss"), though installing ss3sim will do this for you if you don't already have r4ss available in your R library.

Help files

A brief, formal summary of the original package is available from @anderson2014a. More up-to-date help can be found using the help files available within the package, e.g., ?functionname, or by reading the vignettes, e.g., browseVignettes().

help(package = "ss3sim")

Please direct any questions, suggestions, or bug reports to the ss3sim issue tracker.

Simulation overview

Setting up the file structure

The ss3sim package is set up assuming there is an established base-case operating model (OM) and estimation model (EM) to work with. ss3sim comes with a generic, built-in model setup that can be used as is, modified, or replaced. (For more details, see the vignettes on modifying these model setups and creating your own OM and EMs). Each OM and EM should be in its own folder.

The OM folder should have the following files:


The EM folder should have the following files:


The names of the .ctl and .dat files are not important. The wrapper functions within ss3sim will rename the files after they are copied to appropriate folders. These files should be derived from .ss_new files but with file extensions as shown above. It is important to use .ss_new files so the files have consistent formatting. Many of the functions in this package and r4ss depend on that formatting.

To obtain .ss_new files, open a command window in the OM folder and type ss to run the model. Once the model is finished running, it will generate a number of files, including .ss_new files. Remove the .ss_new file extension from the files needed and add the appropriate file extension. Then, move the generated data file to the EM folder and run the EM. This step is not necessary if you choose to use the provided OM and EM because it was already done for you.

Cases and scenarios {#case-parsing}

The wrapper function run_ss3sim uses the scenarios argument to determine how to run the simulation. Each 'scenario' is a combination of letters, numbers, and dashes, where the letter x number combinations that combine to create unique scenarios are referred to as cases. The types of cases are data quality (D), estimation (E), fishing mortality (F), retrospective (R), and any other letter describing a time varying case (e.g., M for natural mortality, S for selectivity, or G for growth). The different versions of each case are identified with numbers. Combinations of cases are followed by an alphanumeric stock or species identifier (e.g., cod) to complete the scenario. For example, the base-case scenario for a cod stock might be D0-E0-F0-M0-cod. The order of the cases does not matter, as long as they are used consistently.

ss3sim relies on a set of plain text files to control the simulation. These plain text files are read by get_caseval and turned into argument lists that are passed to run_ss3sim. The function create_argfiles creates template input files. It reads the various functions and parses the arguments and default values into plain text files. The default settings create these files:

  1. E0-spp.txt (for estimation method cases)

  2. F0-spp.txt (for fishing mortality cases)

  3. index0-spp.txt (controlled by the D (data) case)

  4. agecomp0-spp.txt (controlled by the D (data) case)

  5. lcomp0-spp.txt (controlled by the D (data) case)

  6. X0-spp.txt (for a time-varying parameter X, where X could be any time-varying case letter)

  7. R0-spp.txt (for retrospective cases)

Illustration of input and output folder and file structure for an ss3sim simulation. Folders are shown in blue, input files in orange, and output files in grey. All input and output files are in plain text format. Case files (orange files at bottom left) combine cases (e.g., M0 for a given natural mortality trajectory) with groups of OMs and EMs (e.g., cod like). Alternatively, a user can skip setting up case files and specify the simulation cases directly in R code (see the vignette section on using ss3sim_base()).

After running create_argfiles(), look in your working directory for the template input files. (You can use getwd() to see your current working directory or setwd() to set a different working directory.) Change the case ID number, which defaults to zero, and the species identifier to a three letter identifier. For example, you might use cod, sar, or fla for cod, sardine, or flatfish. An example filename would therefore be M1-sar.txt or lcomp2-fla.txt. The case D1 corresponds to the files index1-spp.txt, agecomp1-spp.txt, and lcomp0-spp.txt. The other case types have single argument files.

The first column in the text file denotes the argument to be passed to a function. The second argument denotes the value to be passed. See the help for a change function to see the arguments that need to be declared. For example, see ?change_f. You can use any R syntax to declare argument values. For example, c(1, 2, 4), or seq(1, 100). Character objects do not need to be quoted as long as they are one word, but can be if you would like. However, be careful not to use the delimiter (set up as a semicolon) anywhere else in the file besides to denote columns. You can add comments after any # symbol.

Putting that all together, below is what an example F1-cod.txt file might look like:

 years; 1913:2012
 years_alter; 1913:2012
 fvals; c(rep(0, 25), rep(0.114, 75))

Case file names

Function names and associated case files. Case file names are given as the prefix. For example a case file for sample_index might be index0-cod.txt.

ss3sim function Case file name Description --------------- -------------- ----------- change_f() F Fishing mortality (mandatory) sample_index() index Index data (mandatory) sample_lcomp() lcomp Length composition data (mandatory) sample_agecomp() agecomp Age composition data (mandatory) change_r() retro Retrospective years change_e() E Estimation change_tv() anything else, see below Time-varying variables change_data() data Available OM data, bins, etc. sample_calcomp() calcomp Catch at length data sample_wtatage() wtatage Weight at age data sample_mlacomp() mlacomp Mean length at age data

change_tv() can take any other case file name, as long as the first line is function_type; change_tv.

Bias adjustment

Bias adjustment helps assure that the estimated log-normally distributed recruitment deviations are mean-unbiased leading to mean-unbiased estimates of biomass [@methot2011]. The high-level wrapper function run_ss3sim allows users to specify whether or not they would like to use the bias adjustment routine built into the package by setting the argument bias_adjust to TRUE or FALSE.

Output file structure

Internally, the function copy_ss3models creates a folder structure and copies the OM and EM. The folder structure looks like:


The integer values after the scenario ID represent different iterations; the total number of iterations run are specified by the user. If you are using bias adjustment then there will be some additional folders.

Note that the OM and EM folders will be renamed om and em within each iteration, the OM and EM are checked to make sure they contain the minimal files (as listed above), the filenames will be all lowercase, the data file is renamed ss3.dat, the control files are renamed om.ctl or em.ctl, and the starter and control files are adjusted to reflect these new file names. The functions in this package assume you have set your working directory in R to be the base folder where you will store the scenario folders.

An example simulation with ss3sim {#example}

As an example, we will run a 2x2 simulation design in which we test (1) the effect of high and low precision on the index of abundance provided by the survey and (2) the effect of fixing versus estimating natural mortality (M). All of the required files for this example are contained within the ss3sim package. To start, we will locate three sets of folders: the folder with the plain-text case files, the folder with the OM, and the folder with the EM.

d <- system.file("extdata", package = "ss3sim")
case_folder <- file.path(d, "eg-cases")
om <- file.path(d, "models", "cod-om")
em <- file.path(d, "models", "cod-em")

See the folder ss3sim/inst/extdata/eg-cases inside the ss3sim source code for all the case files that are used in this example simulation. You can either download the source code or find the folder defined in the block of R code above this paragraph.

Creating the case files

We will base the simulation around the cod-like species in the papers @ono2014 and @johnson2014, both of which used the ss3sim package. You can refer to these papers for details on how the models were set up. If we were starting from scratch, we would run the function create_argfiles(), which would create a default set of configuration files in our R working directory. Instead we will start with the configuration files from those papers.

To investigate the effect of different levels of precision on the survey index of abundance, we will manipulate the argument sds_obs that gets passed to sample_index. This argument ultimately refers to the standard error on the log(index) value, as defined in SS. In case 0, we will specify the standard deviation at 0.1 and in case 1 we will increase the standard deviation to 0.4. We can do this by including the line sds_obs; 0.1 in the file D0-cod.txt and the line sds_obs; 0.4 in the file D1-cod.txt. The file index0-cod.txt will therefore look like

fleets;  2
years;   list(seq(76, 100, by = 2))
sds_obs; list(0.1)

fleets refers to the fleet number we want to sample from (as defined in the model), 2 in codOM.dat refers to the survey fleet. We then run our survey from the 38th year to the last year of the simulation with a standard error on the log(survey index) of 0.1.

We will not describe the length- or age-composition sampling here in detail, but you can refer to the help files ?sample_lcomp and ?sample_agecomp along with the case files included in the package data. Also see vignette section where we describe the theory behind the age- and length-composition sampling in ss3sim.

To investigate the effect of fixing versus estimating M, we will manipulate the argument natM_val that gets passed to change_e. The first entry of natM_val is the value which M is fixed or initialized at and the second refers the phase. In case 0, we will set the phase in which M is estimated to -1 (any negative phase number will tell SS to not estimate a particular parameter) and use the argument NA to fix M at the true value (i.e., do not modify it from what is in the EM .ctl file). In case 1, we will estimate M in phase 3 and initialize the estimation of M at 0.20. We can do this by including the line: natM_val; c(NA, -1) in the file E0-cod.txt and the line: natM_val; c(0.20, 3) in the file E1-cod.txt. This means that case E1 will initialize M at 0.20 and estimate M in the third phase. For example, the complete case file E0-cod.txt is:

natM_type;          1Parm
natM_n_breakpoints; NULL
natM_lorenzen;      NULL
natM_val;           c(NA,-1)
par_name;           LnQ_base_3_CPUE
par_int;            NA
par_phase;          -1
forecast_num;       0

We will set the fishing mortality $F$ to a constant level at $F_\mathrm{MSY}$ ($F$ at maximum sustainable yield) from when the fishery starts (in year 25) until the end of our simulation (year 100). Therefore, this is our F0-cod.txt:

years;       1:100
years_alter; 1:100
fvals;       c(rep(0,25), rep(0.114,75))

Although we do not have any time-varying processes in the OM of these examples, we include M case files in which we describe M as stationary (M0-cod.txt):

function_type; change_tv
param;         NatM_p_1_Fem_GP_1
dev;           rep(0, 100)

Simply by changing the deviations from a vector of zeros to any vector pattern of deviations, we could add time-varying M. In a [later vignette section]((#time-varying) we describe how time-varying parameters are unique in ss3sim and how you can make any parameter of your OM time varying using this approach.

Checking the case files

It is a good idea to check that the case files are manipulating the SS model files as intended. For example, one should run a single iteration of your simulation through run_ss3sim and carefully inspect the OM and EM model files that it creates.

run_ss3sim(iterations = 1, scenarios =
  case_files = list(F = "F", D = c("index", "lcomp", "agecomp"),
    E = "E", M = "M"),
  case_folder = case_folder, om_dir = om,
  em_dir = em)

And then check the SS files that are created.

Self testing: deterministic simulations to check the models for bias {#deterministic}

Self testing is a crucial step of any simulation study. One way to accomplish this is to set up an EM that is similar to the OM and include minimal error.

We will run some simulations to check our EM for bias when process and observation error are minimized. To do this, we will start by setting up a 100 row (number of years) by 20 column (number of iterations) matrix of recruitment deviations, where all values are set to zero.

recdevs_det <- matrix(0, nrow = 100, ncol = 20)

Then we will set up case “estimation” files in which the initialized values of the recruitment deviation standard deviations, SR_sigmaR, are set to the nominal level of 0.001. We will name these files E100-cod.txt and E101-cod.txt. In the case files, the key element is setting par_name = SR_sigmaR (the SS parameter name) and par_int = 0.001 (the initial value). The arguments par_name and par_int are set up to handle vectors of parameter names and values. In this example, changing SR_sigmaR will be the second parameter in the vector. Again, you can look at all the case files in the package folder /inst/extdata/eg-cases/. As an example, here is the file E101-cod

natM_type;          1Parm
natM_n_breakpoints; NULL
natM_lorenzen;      NULL
natM_val;           c(0.20,3)
par_name;           LnQ_base_3_CPUE,SR_sigmaR
par_int;            c(NA,0.001)
par_phase;          c(-1,-1)
forecast_num;       0

To minimize observation error on the survey index, we will create a data case 100 that has a standard deviation on the survey observation error of 0.001. Therefore, our case file index100-cod.txt will look like this:

fleets;  2
years;   list(seq(60, 100, by = 2))
sds_obs; list(0.001)

When we run the simulations, we will pass our deterministic recruitment deviations to the function run_ss3sim. Running 20 iterations should be enough to identify whether our models are performing as we expect when bias_adjust=TRUE.

run_ss3sim(iterations = 1:20,
  scenarios = c("D100-E100-F0-M0-cod", "D100-E101-F0-M0-cod"),
  case_files = list(F = "F", D = c("index", "lcomp", "agecomp"),
    E = "E", M = "M"),
  case_folder = case_folder, om_dir = om, em_dir = em,
  bias_adjust = TRUE, user_recdevs = recdevs_det)

We have written out the scenario names in full for clarity, but ss3sim also contains a convenience function expand_scenarios. With this function we could instead write:

x <- expand_scenarios(list(D = 100, E = 100:101, F = 0, M = 0),
  species = "cod")
run_ss3sim(iterations = 1:20, scenarios = x,
  case_folder = case_folder, om_dir = om, em_dir = em,
  bias_adjust = TRUE, user_recdevs = recdevs_det,
  case_files = list(F = "F", D = c("index", "lcomp", "agecomp"),
    E = "E", M = "M"))

Note that due to the way that SS is being used as an OM, you may see the following ADMB error may appear in the console:

Error -- base = 0 in function prevariable& pow(const prevariable& v1,

However, this is not a problem because ADMB is not used to optimize the OM, and thus, the error can safely be ignored.

Running the stochastic simulations

The package contains a set of normally-distributed recruitment deviations, with a mean of -0.5 and a standard deviation of 1 (bias-corrected standard-normal deviations). To use the pre-specified deviations remove the argument user_recdevs from run_ss3sim. Then, process error will be unique for each iteration but consistent across scenarios for a specific iteration to facilitate comparison between scenarios.

We will only run 100 iterations here to save time and keep the package size down, but in a real simulation testing study you may want to run many more iterations, perhaps testing how many iterations are required before the key results stabilize.

run_ss3sim(iterations = 1:100, scenarios =
  case_files = list(F = "F", D = c("index", "lcomp", "agecomp"),
    E = "E", M = "M"),
  case_folder = case_folder, om_dir = om,
  em_dir = em, bias_adjust = TRUE)

Reading in the output and plotting the results

.csv files

The function get_results_all reads in a set of scenarios and combines the output into the following two .csv files: ss3sim_scalar.csv and ss3sim_ts.csv. The scalar file contains values for which there is a single estimated value (e.g., MSY) and the ts file refers to values for which there are time series of estimates available (e.g., biomass for each year).

get_results_all(user_scenarios =
# Read in the data frames stored in the csv files
scalar_dat <- read.csv("ss3sim_scalar.csv")
ts_dat <- read.csv("ss3sim_ts.csv")

If you would like to follow along with the rest of the vignette without running the simulations detailed above, you can load a saved version of the output.

data("scalar_dat", package = "ss3sim")
data("ts_dat", package = "ss3sim")

Post-processing of results

scalar_dat <- calculate_re(scalar_dat, add = TRUE)
ts_dat <- calculate_re(ts_dat, add = TRUE)
ts_dat <- merge(ts_dat, scalar_dat[,c("scenario", "iteration",

scalar_dat_det <- scalar_dat[scalar_dat$E %in% c("E100", "E101"), ]
scalar_dat_sto <- scalar_dat[scalar_dat$E %in% c("E0", "E1"), ]
ts_dat_det <- ts_dat[ts_dat$E %in% c("E100", "E101"), ]
ts_dat_sto <- ts_dat[ts_dat$E %in% c("E0", "E1"), ]

scalar_dat_long <- scalar_dat
colnames(scalar_dat_long) <- gsub("(.+)_re", "RE _\\1", colnames(scalar_dat_long))
scalar_dat_long <- reshape(scalar_dat_long, sep = " _", 
  direction = "long",
  varying = grep(" _", colnames(scalar_dat_long)),
  idvar = c("scenario", "iteration"),
  timevar = "parameter")


Use the long-data format to create a multi-panel plot with ggplot.

p <- plot_scalar_boxplot(scalar_dat_long[
  scalar_dat_long$parameter %in% c("depletion", "SR_BH_steep", "SSB_MSY") &
  scalar_dat_long$E %in% c("E100", "E101"), ], 
  x = "D", y = "RE", re = FALSE,
  vert = "E", horiz = "parameter", print = FALSE)
# see plot_scalar_points() for another plotting function

Let's look at the relative error in estimates of spawning biomass. We will colour the time series according to the maximum gradient. Small values of the maximum gradient (approximately 0.001 or less) indicate that model convergence is likely. Larger values (greater than 1) indicate that model convergence is unlikely. Results of individual iterations are jittered around the vertical axis to aid in visualization. The following three blocks of code produce Figures~\ref{fig:plot-sto-ts}, \ref{fig:ssb-ts-plots}, and \ref{fig:relative-error-boxplots-sto}.

p <- plot_ts_lines(ts_dat_sto, y='SpawnBio_re', 
  vert = "E", horiz="D", print = FALSE, col = "max_grad")
p <- plot_ts_lines(ts_dat_sto, y='SpawnBio_em', 
  vert = "E", horiz="D", print = FALSE, col = "max_grad")
p <- plot_scalar_boxplot(scalar_dat_long[
  scalar_dat_long$parameter %in% c("depletion", "SR_BH_steep", "SSB_MSY") &
  scalar_dat_long$E %in% c("E0", "E1"), ], 
  x = "D", y = "RE", re = FALSE, 
  vert = "E", horiz = "parameter", print = FALSE)

Using ss3sim_base directly {#ss3simbase}

An alternative approach to using ss3sim is to skip setting up the case files and use ss3sim_base directly by passing lists of arguments. The lists correspond to the arguments to each of the change or sample functions without any references to the files that need to be modified. The documentation for each function includes an asterisk beside the arguments that you can specify. Note that if you do not include an argument then ss3sim_base will assume the value of that argument is NULL.

When we call ss3sim_base directly, the scenario ID only serves the function of identifying the output folder name and could technically be any character string. For example, we could have run the scenario D1-E0-F0-M0-cod that we ran before:

d <- system.file("extdata", package = "ss3sim")
om <- paste0(d, "/models/cod-om")
em <- paste0(d, "/models/cod-em")

F0 <- list(years = 1:100, years_alter = 1:100, fvals =
  c(rep(0, 25), rep(0.114, 75)))

index1 <- list(fleets = 2, years = list(seq(60, 100, by = 2)),
  sds_obs = list(0.1))

lcomp1 <- list(fleets = c(1, 2), Nsamp = list(100, 100), years =
  list(25:100, seq(60, 100, by = 2)), lengthbin_vector = NULL,
  cpar = c(1, 1))

agecomp1 <- list(fleets = c(1, 2), Nsamp = list(100, 100), years =
  list(25:100, seq(60, 100, by = 2)), agebin_vector = NULL,
  cpar = c(1, 1))

E0 <- list(natM_type = "1Parm", natM_n_breakpoints = NULL,
  natM_lorenzen = NULL, natM_val = c(NA,-1), par_name =
  "LnQ_base_3_CPUE", par_int = NA, par_phase = -1, forecast_num = 0)

M0 <- list(NatM_p_1_Fem_GP_1 = rep(0, 100))

# todo: add data_params so that this function runs.
ss3sim_base(iterations = 1:20, scenarios = "D1-E0-F0-M0-cod",
  f_params = F0, index_params = index1, lcomp_params = lcomp1,
  agecomp_params = agecomp1, estim_params = E0, tv_params = M0,
  om_dir = om, em_dir = em)

Scenarios, combinations of case IDs

You do not need to stick with the scenario ID scheme laid out by default in ss3sim. You are free to choose your own scheme by adjusting the list of values passed to the case_files argument in run_ss3sim(). Below is an example where we"ll specify the arguments to the sampling functions via their own case files. We"ll arbitrarily call these cases X, Y, and Z. We will start by copying over some case arguments built into the package. We will copy what was called index0-cod.txt, lcomp0-cod.txt, and agecomp0-cod.txt into X-cod.txt, Y-cod.txt, and Z-cod.txt. Then we’ll set up our custom case_files list and call run_ss3sim()

case_folder <- system.file("extdata", "eg-cases", package = "ss3sim")
om <- system.file("extdata", "models/cod-om", package = "ss3sim")
em <- system.file("extdata", "models/cod-em", package = "ss3sim")
files <- list.files(case_folder, pattern = "0-cod")

temp_case_folder <- "ss3sim-example-cases"
file.copy(paste0(case_folder, "/", files), temp_case_folder)

# now make X, Y, and Z case files:
file.copy("index0-cod.txt", "X0-cod.txt")
file.copy("lcomp0-cod.txt", "Y0-cod.txt")
file.copy("agecomp0-cod.txt", "Z0-cod.txt")

# our custom specification:
case_files <- list(F = "F", X = "index", Y = "lcomp", Z = "agecomp")

# and use run_ss3sim() with our new case_file list:
run_ss3sim(iterations = 1,
  scenarios = "X0-Y0-Z0-F0-cod",
  case_folder = temp_case_folder,
  om_dir = om, em_dir = em,
  case_files = case_files)

Time-varying parameters in the OM {#time-varying}

The ss3sim package includes the capability for inducing time-varying changes in parameters in the OM. The package currently does not have built-in functions to turn on/off the estimation of time-varying parameters, so this cannot be done with case arguments. However, it is possible to create versions of an EM with and without time-varying estimation of a parameter. This approach would allow for testing of differences between estimating a single, constant parameter vs.\ a time-varying estimate.

You can specify any parameter defined in your OM model as time varying through the change_tv function. The function works by adding an environmental deviate ($env$) to the base parameter ($par$), creating a time varying parameter ($par"$) for each year ($y$), \begin{equation} par"y = par + link * env_y. \end{equation} The $link$ is pre-specified to a value of one and $par$ is the base value for the given parameter, as defined by the INIT value in the .ctl file. For all catchability parameters (q), the deviate will be added to the log transform of the base parameter using the following equation: \begin{equation} \log(q"{y}) = \log(q) + link * env_{y}. \end{equation} The vector of deviates must contain one value for every year of the simulation and can be specified as zero for years in which the parameter does not deviate from the base parameter value.

Currently, the change_tv function only works to add time-varying properties to a time-invariant parameter. It cannot alter the properties of parameters that already vary with time. Also, it will not work with custom environmental linkages. Environmental linkages for all parameters in the OM must be declared by a single line i.e., 0 #_custom_mg-env_setup (0/1) prior to using change_tv. Additionally, SS does not allow more than one stock recruit parameter to vary with time. Therefore, if the .ctl file already has a stock recruit parameter that varies with time and you try to implement another, the function will fail.

The change_tv function can be used for a number of purposes, thus it interacts differently with the case files than the other change functions. To pass arguments to change_tv through run_ss3sim you need to: (1) create a case file with an arbitrary letter not used elsewhere (i.e., anything but D, E, F, or R) and (2) include the line function_type; change_tv in your case file. For example, you might want to use M for natural mortality, S for selectivity, or G for growth.

Incorporating a new case ID

ss3sim is designed with the following set of mandatory case IDs that must be specified for each run: data (D) and fishing effort (F). The following case IDs also exist but are not mandatory: estimation (E) and retrospective (R). In many instances a user may wish to include a new case ID. This section is designed to explain how to accomplish this within the ss3sim framework. Note that as currently developed, ss3sim incorporates all new cases by using the change_tv function, even if the changes are not time-varying. However, you can simply pass a constant vector of values to induce a constant change in a parameter.

As an example, you might want to incorporate changes in selectivity (time-varying or not) in the OM to more accurately mimic the behaviour of a real fishery. Using the help file for change_tv, we can create a set of "S"(for selectivity) case argument files which contain the changes required. You may choose to only have a "base case" which applies to all scenarios in the simulation test, or you may have multiple cases to investigate how different patterns of selectivity in the OM affect the EM. Now that the case argument files have been created, you need to tell ss3sim to use them because they are outside of the mandatory set. This is done through the case_files argument in the get_caseargs function, which is passed through by the run_ss3sim. In this case, we need to tell it to look for case argument files beginning with S (see the help file for get_caseargs for more information),

case_files = list(D = c("index", "lcomp", "agecomp"), F = "F", S = "S")

Note that S is simply added to the end of the default list of case IDs.

Data case

Note that the data case ID is a special case because its case argument files do not begin with D. This is because there are three functions associated with the data case, independently modifying three different parts of the .dat file.

Adjusting the available OM data dynamically {#change-data}

ss3sim can dynamically adjust the type of data available from the OM for sampling. For example, you might wish to sample conditional length-at-age in a certain bin structure for certain years and fleets. To do that, the OM will need to generate length and age data for appropriate years and fleets at a sufficiently fine bin size. However, the idea behind ss3sim is to avoid having to hand code many different versions of an OM that have only minor differences. One way around this would be to write an OM that included all possible data types for all possible years at an extremely fine bin structure. But, this can substantially slow down how quickly the OM runs in SS. Therefore, we made ss3sim capable of dynamically modifying the OM to include just the data types and quantities that are needed based on arguments that are passed to the sampling functions (e.g., sample_agecomp). Internally, ss3sim uses the function calculate_data_units to generate a superset of all necessary fleet and year combinations. See the section below for more information on these types of data.

The dynamic creation of available data is the default behavior of the package. In practice this means the user does not need to manage which data types, fleets, or years are produced by the OM. The following rules are applied if the arguments to any sampling function are not NULL:

For instance, if empirical weight-at-age data are to be sampled (sample_wtatage is called in a scenario), change_data adds age composition and mean length-at-age data to the model.

Another feature of change_data is that it can also modify the binning structure of the data (not the population-level bins). The ages and length bins in the data are specified in the arguments len_bin and age_bin in ss3sim_base. These bins are consistent across fleets, years, etc. Empirical weight-at-age data are a special case because they are generated in a separate file and use the population bins. Also, when either of these bin are arguments are set to NULL (the default value), then the respective bins are taken from the OM.

Generating observation error {#obs-error}

Currently ss3sim supports the sampling of five types of data: indices of abundance, length and age compositions, mean length (size) at age, empirical weight at age, and conditional age at length. Functions generate these data using the expected values from the OM output files and create the input .dat files for the EM.

This section briefly details the background and functionality of these "sampling" functions. Note that the years/fleets to be sampled must be present in the OM output data file. ss3sim does this data creation dynamically using the function, but in some cases it may need to be done manually.

In simulation work, the true underlying dynamics of the population are known and therefore a variety of sampling techniques are possible. For instance, ideal sampling (in the statistical sense) can be done easily. However, there is some question about the realism behind providing the model with this kind of data because it is unlikely to happen in practice [@pennington2002]. Thus, there is a need to be able to generate more realistic data. In ss3sim we can accomplish this in several ways. One way is to use flexible distributions which allow the user to control the statistical properties (e.g., overdispersion) of the data, while another is to use unmodeled process error (typically in selectivity).

For example, consider simulating age-composition data. Under perfect mixing of fish and truly random sampling of the population, the samples would be multinomial. However, in practice neither of these are true because the fish tend to aggregate by size and age, and it is difficult to take random samples [e.g., @pennington2002]. This causes the data to have more variance than expected, i.e., be overdispersed, and the effective sample size is smaller than expected [@hulson2012; @maunder2011]. For example, if a multinomial likelihood is assumed for overdispersed data, the model puts too much weight on those data, at the cost of fitting other data less well [@maunder2011]. Analysts thus often "tune" their model to find a more appropriate sample size (called "effective sample size") that more accurately reflects the information in the composition data [@francis2011]. In our case, we exactly control how the data are generated and specified in the EM — providing a large amount of flexibility to the user. What is optimal will depend on the questions addressed by a simulation and how the results are meant to be interpreted. We caution users to carefully consider how data are generated and fit in an ss3sim simulation.

Indices of abundance

The function sample_index facilitates generating relative indices of abundance (catch-per-unit-effort (CPUE) data for fishery fleets). It samples from the biomass trends available for the different fleets to simulate the collection of CPUE and survey index data. The user can specify which fleets are sampled in which years and with what amount of noise. Different fleets will "see"different biomass trends due to differences in selectivity. Catchability, $q$, for the OM is equal to one, $q=1$, and thus, the indices are actually absolute indices of (spawning) biomass.

In practice, sampling from the abundance indices is relatively straightforward. The OM .dat file contains the annual biomass values for each fleet, and the sample_index function uses these true values as the mean of a distribution. The function uses a bias-corrected log-normal distribution with expected values given by the OM biomass and a user-provided standard deviation term that controls the level of variance.

More specifically, let

$$\begin{aligned} B_y&=\text{the true (OM) biomass in year $y$}\newline \sigma_y&=\text{the standard deviation provided by the user}\newline X\sim N(0, \sigma_y^2)&=\text{a normal random variable} \end{aligned}$$

Then the sampled value, $B_y^\text{obs}$ is

$$B_y^\text{obs}=B_y e^{X-\sigma_y^2/2}$$

which has expected value $\mathrm{E}\left[B_y^\text{obs}\right]=B_y$ due to the bias adjustment term $\sigma_y^2/2$ (SR_sigmaR in SS). This process generates log-normal values centered at the true value. It is possible for the user to specify the amount of uncertainty (e.g., to mimic the amount of survey effort), but, currently, it is not possible to induce bias in this process.

Effective sample sizes

For index data, which are assumed by SS to follow a log-normal distribution, the weight of the data is determined by the CV of each point. As the CV increases, the data have less weight in the joint likelihood. This is equivalent to decreasing the effective sample size in other distributions. ss3sim sets these values automatically at the truth internally, although future versions may allow for more complex options. The user-supplied $\sigma_y$ term is written to the .dat file along with the sampled values, so that the EM has the correct level of uncertainty. Thus, the EM has unbiased estimates of both $B_y$ and the true $\sigma_y$ for all fleets in all years sampled.

Age and length compositions {#comps}

The functions sample_agecomp and sample_lcomp sample from the true age and length compositions using either a multinomial or Dirichlet distribution. The user can specify which fleets are sampled in which years and with what amount of noise (via sample size).

The following calculations are shown for how age-composition-data is sampled, but the equations apply equally to how length-composition data is sampled as well. The multinomial distribution $\mathbf{m} \sim \text{MN}\left(\mathbf{p}, n, A\right )$ is defined as

$$\begin{aligned} \mathbf{m} &= m_1, \ldots, m_A\ &=\text{the number of fish observed in bin $a$}\ \mathbf{p}&= p_1, \ldots, p_A\ &=\text{the true proportion of fish in bin $a$}\ n&=\text{sample size} \ A&=\text{number of age bins} \end{aligned}$$

In the case of SS, actual proportions are input as data, so $\mathbf{m}/n$ is the distribution used instead of $\mathbf{m}$. Thus, the variance of the estimated proportion for age bin $a$ is $\mathrm{Var}\left[M_a/n\right]=p_aq_a/n$. Note that $m_a/n$ can only take on values in the set ${0, 1/n, 2/n, \ldots, 1}$. With sufficiently large sample size this set approximates the real interval $[0,1]$, but the key point being that only a finite set of rational values of $m_a/n$ are possible.

The multinomial, as described above, is based on assumptions of ideal sampling and is likely unrealistic, particularly with large sample sizes. One strategy to add realism is to allow for overdispersion.

Sampling with overdispersion

The composition sampling functions provided in the package allow the user to specify the level of overdispersion present in the sampled data through the argument cpar. cpar is the ratio of the standard deviation between a multinomial and Dirichlet distribution with the same probabilities and sample size. Thus, a value of 1 would be the same sd, while 2 would be twice the sd of a multinomial (overdispersed). The package also currently allows for specifying the effective sample sizes used as input for the EM separately from the sample sizes used to sample the data. See the function documentation for sample_agecomp and sample_lcomp for more detail.

Here, we describe the implementation of the Dirichlet distribution used in the package. This distribution has the same range and mean as the multinomial, but a different variance controlled by a parameter (the variance is determined exclusively by the cell probabilities and sample size n the multinomial) making it more flexible than the multinomial. Let $\mathbf{d} \sim \text{Dirichlet}\left (\boldsymbol{\alpha}, A\right )$ be a Dirichlet random vector. It is characterized by

$$\begin{aligned} \mathbf{d} &= d_1, \ldots, d_A\ &=\text{the proportion of fish observed in bin $a$}\ \boldsymbol{\alpha}&= \alpha_1, \ldots, \alpha_A\ &=\text{concentration parameters for the proportion of fish in bin $a$}\ A&=\text{number of age bins} \end{aligned}$$

When using the Dirichlet distribution to generate random samples, it is convenient to parameterize the vector of concentration parameters $\boldsymbol{\alpha}$ as $\lambda \mathbf{p}$ so that $\alpha_a=\lambda p_a$. The mean of $d_a$ is then $\mathrm{E}\left[d_a\right]=p_a$ and the variance is $\mathrm{Var}\left[d_a\right]=\frac{p_aq_a}{\lambda+1}$. The marginal distributions for the Dirichlet are beta distributed. In contrast to the multinomial above, the Dirichlet generates points on the real interval $[0,1]$.

The following steps are used to generate overdispersed samples:

Effective sample sizes

The term "effective sample size" (ESS) for SS often refers to the tuned sample size, right-weighted depending on the information contained in the data. These values are estimated after an initial run and then written to the .dat file and SS is run again [@francis2011]. Perhaps a better term would be "input sample size" to contrast it with what was originally sampled.

In any case, the default behavior for both multinomial and Dirichlet generated composition data is to set the effective sample size automatically in the sample_lcomp and sample_agecomp functions. In the case of the multinomial, the effective sample size is just the original sample size (i.e., how many fish were sampled, the number of sampled tows, etc.). However, with the Dirichlet distribution, the effective sample size depends on the parameters passed to these functions and is automatically calculated internally and passed on to the .dat file. The effective sample size is calculated as $N_{eff}=n/c^2$, where $c$ is cpar. Note that these values will not necessarily be integer-valued, and SS handles this without issue.

If the effective sample size is known and passed to the EM, there is much similarity between the multinomial and Dirichlet methods for generating data. The main difference arises when sample sizes ($n$) are small; in this case the multinomial will be restricted to few potential values (i.e., $0/n, 1/n, \ldots, n/n$), whereas the Dirichlet has values in $[0,1]$. Thus, without the Dirichlet it would be impossible to generate realistic values that would come about from highly overdispersed data with a large $n$.

The ability to is specify the ESS was implemented to allow for users to explore data weighting issues. If you want to specify an input/effective sample size different that what is used in sampling, this is available via the ESS arguments of these two functions.

Mean length at age

The ability to sample mean-length-at-age data exists in ss3sim, but has not been thoroughly tested or used in a simulation before. Contact the developers for more information.

Conditional age at length

Conditional age-at-length (CAAL) data is an alternative to age compositions when there are paired age and length observations on the same fish. SS has the capability to associate the measurements, and doing so has many appealing attributes. For instance, using CAAL data, instead of both age and length compositions of the same sampled fish, avoids including the same data twice. CAAL data are also expected to be more informative about growth than marginal age-composition data. Although, to date, few studies have examined this data type [@he2015; @monnahan2015]. However, their sampling in ss3sim is significantly more complicated than that for simple age compositions (which are assumed independent of lengths).

A CAAL matrix, as implemented in SS, is inputted the same way and in the same block in the .dat file as age compositions. Each row of this matrix corresponds to a length bin (for a year and fleet), while the columns remain ages and the cells are the number of fish. Thus, each row represents the observed age distribution for a length bin, conditioned on the fish lengths that were observed in the length compositions. Obviously for older or younger fish the age distribution will be truncated, and often many rows will be empty because no fish of that length bin were observed (they can be dropped from SS in this case).

Imagine sampling $N$ fish for a year and fleet, taking lengths, and binning them to create the length distributions (numbers of fish in each length bin). From there, several strategies are possible for sampling ages from those fish,

(1) age all fish, (2) take random subset of fish independent of length bin, or (3) take a fixed number of fish from each length bin.

ss3sim can currently only handle option (2) and by extension (1). Future versions could include (3) but this is not currently implemented.

In any case, we need a hierarchy of sample sizes to implement this data type. Consider a row of the CAAL matrix. The sample size for that row comes from the length compositions and represents the maximum number of fish that could be aged. If all fish are not aged, then a new sample size must be drawn that is strictly less than or equal to the number of fish that were sampled for their length. This will then be used to draw ages randomly from the expected values. If we consider all rows for a fleet and year (one for each length bin), then the sum of those will be the sample size for the CAAL data. If all fish are aged, then no subsampling is done. However, if the CAAL sample size is less than the length sample size, we need to be careful to not age more fish in a length bin than were in that length bin in the first place. We accomplish this in the code by doing sampling without replacement for vectors of length bins equal to the number of fish in them. This ensures realistic sampling. If the option (3) above were implemented, a different strategy would need to be implemented. For instance, if the user wants 10 fish from each length bin, but only 5 fish were observed, what to do? Gwladys Lambert has been working on this issue.

A further complication is that when Dirichlet sampling is used for length compositions, the number of fish observed will be real-valued, not whole fish. One cannot simply multiply by the length composition sample size to get whole numbers since they are real, and rounding or truncating would be unsatisfactory. Currently the function simply draws a multinomial sample from the length compositions of specified size (Nsamp). However, this does not guarantee that fewer fish are aged than lengthed. If you are specifying a small number of fish to age relative to length, then this might be alright. However, we discourage the use of Dirichlet length samples when using CAAL data as currently implemented.

CAAL data are inherently tied to the length compositions (e.g., as if a trip measured lengths and ages for all fish), in contrast to the age compositions which were generated independently of the length data (e.g., as if one trip measured only ages and a second only lengths). This is an important distinction.

Empirical weight at age

This data type is very different than all the others. With this setting in SS, all length-based processes are bypassed by assigning a weight to each age in each year, from which spawning biomass/fecundity can be determined. These values are not data in the sense they have a likelihood, but are generated from samples.

This sampling function was implemented for a specific study [@kuriyama2015] and may not be satisfactory for other studies. Contact the developers if you wish to use empirical weight at age in your simulation.

Generating process error {#pro-error}

Process error is incorporated into the OM in the form of deviates in recruitment ("recdevs") from the stock-recruitment relationship. Unlike the observation error, the process error affects the population dynamics and thus must be done before running the OM.

These built-in recruitment deviations are standard normal deviates and are multiplied by $\sigma_r$ (recruitment standard deviation) as specified in the OM, and bias adjusted within the package. That is, \begin{equation} \text{recdev}_i=\sigma_r z_i-\sigma_r^2/2 \end{equation} where $z_i$ is a standard normal deviate and the bias adjustment term ($\sigma_r^2/2$) makes the deviates have an expected value of 1 after exponentiation.

If the recruitment deviations are not specified, then the package will use these built-in recruitment deviations. Alternatively you can specify your own recruitment deviations, via the argument user_recdevs to the top-level function ss3sim_base. Ensure that you pass a matrix with at least enough columns (iterations) and rows (years). The user-supplied recruitment deviations are used exactly as specified (i.e., not multiplied by $\sigma_r$ as specified in the SS model), and it is up to you to bias correct them manually by subtracting $\sigma_r^2 / 2$ as is done above. This functionality allows for flexibility in how the recruitment deviations are specified, for example running deterministic runs or adding serial correlation.

Note that for both built-in and user-specified recdevs, ss3sim will reuse the same set of recruitment deviations for all iterations across scenarios. For example if you have two scenarios and run 100 iterations of each, the same set of recruitment deviations are used for iteration one for both those two scenarios.

Stochastic reproducibility {#reproduce}

In many cases, you may want to make the observation and process error reproducible. For instance, you may want to reuse process error so that differences between scenarios are not confounded with process error. More broadly, you may want to make a simulation reproducible on another machine by another user (such as a reviewer).

By default ss3sim sets a seed based on the iteration number. This will create the same recruitment deviations for a given iteration number. You can therefore avoid having the same recruitment deviations for a given iteration number by either specifying your own recruitment deviation matrix through the user_recdevs argument or by changing the iteration numbers (e.g., using iterations 101 to 200 instead of 1 to 100). If you use the latter method, you must specify a vector of iterations instead of a single number. The vector will specify which numbers along a number line to use versus a single number leads to 1:x iterations being generated. If you want the different scenarios to have different process error you will need to make separate calls to run_ss3sim for each scenario.

The observation error seed (affecting the index, length composition, and age composition sampling) is set during the OM generation. Therefore, a given iteration-case-file-argument combination will generate repeatable results. Given that different case arguments can generate different sampling routines (e.g., stochastically sampling or not sampling from the age compositions or sampling a different number of years) the observation error is not necessarily comparable across different case arguments.

Parallel computing with ss3sim {#parallel}

ss3sim can easily run multiple scenarios in parallel to speed up simulations. To run a simulation in parallel, you need to register multiple cores or clusters and set parallel = TRUE in run_ss3sim. For example, we could have run the previous example in parallel with the following code. First, one would need to find out how many cores are on their computer, (e.g., r Sys.getenv("NUMBER_OF_PROCESSORS")) and register a portion of those for parallel processing


registerDoParallel(cores = ifelse(Sys.getenv("NUMBER_OF_PROCESSORS")<6, 2, 4))

We can check to make sure we are set up to run in parallel


#> [1] 4

Run our simulation on four cores simultaneously by setting parallel = TRUE

run_ss3sim(iterations = 1, scenarios =
  case_files =
    list(F = "F", D = c("index", "lcomp", "agecomp"), E = "E"),
  case_folder = case_folder, om_dir = om, em_dir = em,
  bias_adjust = TRUE, parallel = TRUE)

In addition to the check with getDoParWorkers() above, if the simulations are running in parallel, you will also see simultaneous output messages from ss3sim as the simulations run. On a 2.27 GHz Intel Xeon quad-core server running Ubuntu 12.04, this example ran 1.9 times faster on two cores than on a single core, and 3.2 times faster on four cores.

Alternatively, you can run iterations in parallel. This is useful, for example, if you want to run a single scenario as quickly as possible. To do this, just set parallel_iterations = TRUE as well as setting parallel = TRUE. For example,

run_ss3sim(iterations = 1:2, scenarios = "D0-F0-cod",
  case_folder = case_folder, om_dir = om, em_dir = em,
  parallel = TRUE, parallel_iterations = TRUE)

If you are running bias adjustment iterations, these will be run first in serial. The rest of the iterations will be run in parallel

run_ss3sim(iterations = 1:2, scenarios = "D0-F0-cod",
  case_folder = case_folder, om_dir = om, em_dir = em,
  parallel = TRUE, parallel_iterations = TRUE, bias_nsim = 2,
  bias_adjust = TRUE)

Note that if the simulation aborts for any reason while run_ss3sim is running in parallel, you may need to abort the left over parallel processes. On Windows, open the task manager (Ctrl-Shift-Esc) and close any R processes. On a Mac, open Activity Monitor and force quit any R processes. Also, if you are working with a local development version of ss3sim and you are on a Windows machine, you may not be able to run in parallel if you load the package with devtools::load_all(). Instead, do a full install with devtools::install() (and potentially restart R if ss3sim was already loaded).

Putting SS in your path {#path}

Instead of copying the SS binary file (ss.exe) to each folder within a simulation and running it, ss3sim relies on a single binary file being available to the operating system regardless of the folder. To accomplish this, SS must be in your system path, which is a list of folders that your operating system looks in whenever you type the name of a program on the command line. This approach saves on storage space since the SS binary is about 2.2 MB and having it located in each folder can be prohibitive in a large-scale simulation testing study.

The development version on github installs the executable automatically but it is disallowed for CRAN submissions. In any case, it may be useful to put the binary in your path so that you can manually run models during development and testing (i.e., open command window in folder of OM or EM run and run the binary to see SS output).

Unix (OS X and Linux)

To check if SS is in your path, open a Terminal window and type which ss and hit enter. If you get nothing returned, then SS is not in your path. The easiest way to fix this is to move the SS binary to a folder that"s already in your path. To find existing path folders type echo $PATH in the terminal and hit enter. Now move the SS binary to one of these folders. For example, in a Terminal window type:

sudo cp ~/Downloads/SS/usr/bin/

You will need to use sudo and enter your password after to have permission to move a file to a folder like /usr/bin/.

Also note that you may need to add executable permissions to the SS binary after downloading it. You can do that by switching to the folder where you placed the binary (cd /usr/bin/ if you followed the instructions above), and running the command

sudo chmod +x ss

Check that SS is now executable and in your path

which ss

If you followed the instructions above, you will see the following line returned


If you have previously modified your path to add a non-standard location for the SS binary, you may need to also tell R about the new path. The path that R sees may not include additional paths that you have added through a configuration file like .bash_profile. If needed, you can add to the path that R sees by including a line like this in your .Rprofile file. (This is an invisible file in your home directory.)



To check if SS is in your path for Windows 7 and 8: open a DOS prompt and type ss -? and hit enter. If you get a line like ss is not recognized... then SS is not in your path. To add the SS binary file to your path, follow these steps:

  1. Find the correct version of the ss.exe binary on your computer

  2. Record the folder location. E.g. C:/SS/

  3. Click on the start menu and type environment

  4. Choose Edit environment variables for your account under Control Panel

  5. Click on PATH if it exists, create it if does not exist

  6. Choose PATH and click edit

  7. In the Edit User Variable window add to the end of the Variable value section a semicolon and the SS folder location you recorded earlier. E.g., ;C:/SS. Do not overwrite what was previously in the PATH variable.

  8. Restart your computer

  9. Go back to the DOS prompt and try typing ss -? and hitting return again.


Try the ss3sim package in your browser

Any scripts or data that you put into this service are public.

ss3sim documentation built on Nov. 9, 2019, 1:06 a.m.