knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(hake)
Running the models for the hake assessment requires several steps. Note that ALL models (base, bridging, sensitivity, retrospective) must:
extra_mcmc
switched on so that a
mcmc/sso/Report_mce_****.sso
file is generated for every posterior.
This is set with the run_extra_mcmc
variable in the Bash
scripts.Bash
scripts are used to run Rscript -e "r code here"
for all the
model runs and creation of RDS files for those models. This way, there
is no overhead due to keeping the Rstudio IDE open. Using the scripts also
ensures that multicore parallelism can be used (see next section).
Rstudio
constantly refreshes its panels as variables are created, deleted,
and/or changed. In addition, the scripts pipe most messages into the null
device, so they are not printed out to the screen. Printing to screen is
an expensive (time-wise) operation. Our goal is to reduce the time that it
takes to run the models and create the RDS files to a minimum.
The Bash
scripts are all located in the bash-scripts
directory.
A single model can be run inside Rstudio
using a call to
adnuts::run_adnuts()
if you wish, but it will be faster to run a script for
the reasons above.
The adnuts
package implements parallelism using a multisession
plan,
so it can run in parallel in any context meaning it can be run on Windows,
Linux, or Mac, from within Rstudio
or from Rscript
and the parallelism
will always work.
The scripts in this package do not use the multisession
plan. It was too
much work (maybe impossible) to create a customized computing environment
for each and every case. Instead, parallelism is implemented using the
multicore
plan from the
future package.
This type of parallelism plan allows child processes to
be spawned directly from a parent process at any time, and continue without
any additional setup whatsoever, making its implementation trivial in user
code. It relies on the kernel-level fork()
command being implemented in the
operating system. Microsoft Windows does not contain such an implementation
and cannot fork()
child processes. These scripts will not run in parallel
on a Windows machine. They will work on Linux and Mac operating systems only.
They also will not work if run from within Rstudio (on any OS) because it has
another abstraction layer on top of the R environment which renders the
fork()
command unstable. The Windows subsystems for Linux (WSL and WSL2)
do NOT implement fork()
so trying to use them will not work despite them
appearing to be Linux implementations.
To run the Bash
scripts mentioned here in parallel mode, they must be run
on a Linux or Mac machine, outside of Rstudio.
If run from Windows, these scripts will still work but run in sequential
mode which is very, very slow.
The base model is a special case in that it has forecasting and retrospectives. In actuality, those things can be run for any model but it is described here for the base model as that is the usual case. The same procedure should be followed for any model you want retrospectives and/or forecasting for.
Before running any model, you need to make sure the directory is clean,
in terms of extraneous files left over from previously running the model.
You may get an error when running the model if you have not done this.
To clean the directory, use a script (on the server) called clean_ss3
.
Run it from within the directory you want to clean like this: \
clean_ss3
If you want to know where on the server this script resides, run this: \
which clean_ss3
and the script itself can be viewed quickly with this command: \
cat `which clean_ss3`
The clean script also has an argument to recursively remove the mcmc
directory and all of its contents. Make sure you're sure because there's no
recovering the files once you've used that argument: \
clean_ss3 mcmc
Run scripts from within the bash-scripts
directory in this order. In
some cases you need to wait for the previous script to finish. In others,
you can launch two steps concurrently (for example numbers 2 & 3 can be
done at the same time in two different Bash
sessions, as can 4 & 5):
run-base-model.sh
run-base-model.sh
to finishcreate-rds-base.sh
run-forecasts.sh
run-forecasts.sh
to finishcreate-rds-attach-forecasts.sh
run-retrospectives.sh
run-retrospectives.sh 1 2 3 4
run-forecasts.sh 1 2 3 4
to almost finishrun-retrospectives.sh 5 6 7
run-forecasts.sh 5 6 7
to almost finishrun-retrospectives.sh 8 9 10
create-rds-retrospectives.sh
create-rds-attach-retrospectives.sh
Open the bash-scripts/run-base-model.sh
file in an editor.
Make sure all path variables are correct. The path structure on the
Linux server with associated variables is as follows. Whichever computer
you are on, the path structure MUST be the same as this (from
$models_path
forward):
If on the Linux server, you won't need to change any of the path settings,
as it is set up for the server. On another computer, you would just
change the $project_path
to the parent directory of $model_path
.The
$year
is automatically generated. It is the current calendar year,
unless it is currently December, in which case it will be the current
year + 1.
Make sure the settings for running adnuts
are correct. They are:
bash
ss_exe="ss3_2024"
num_chains=16
num_samples=8000
num_warmup_samples=250
adapt_delta=0.95
run_extra_mcmc=TRUE
The num_chains
variable is how many CPUs the model will use to run.
For the other settings, see the adnuts
documentation. The values shown
are what we have used for the last few years (2021--2023).The
run_extra_mcmc
variable must be set to TRUE
.
There is a line in the script which pipes ALL output, including errors
to NULL
which means nothing is printed during the run. This makes the code
fast. If you want to allow messages to be printed (for debugging purposes),
you can comment this line out: \
```bash
/dev/null 2>&1; \
`` To comment it out, place a
#` before it.
Open a Bash
terminal and change directories so you are in the
bash-scripts
directory, and execute the run script as follows. Note that
you must have the leading dot and slash to run it if you are in the
directory in which the script resides.
bash
./run-base-model.sh
It will take about 1 hour 45 minutes to complete (on the server with 16
chains).
Once the model run has completed, create an RDS file for the base model,
which will not yet contain the forecasting and retrospectives. Those will
be added to the file later. To create the file, open the Bash
scripts
bash-scripts/create-rds-base.sh
and bash-scripts/generic-create-rds.sh
.
The first script has some variable settings and then calls the second one,
which also has some settings and the actual call to the R function that
creates the RDS file. After making sure they are correct, run it to create
base model RDS file in the same folder where the model resides. \
bash
./create-rds-base.sh
Verify that the RDS file is where it should be and check that it's time
stamp makes sense. On the server you can check the file information
like this: \
bash
ll /srv/hake/models/2023/01-version/01-base-models/01-base/01-base.rds
If you want to check on the progress of the model run, there are a few things you can look at.
Watch the chains
directories being updated. Each chain from 1 to
whatever you set num_chains
to will have its own directory created and
the model will be running inside each of these in parallel. If you go
into the mcmc
directory of the model about 15 minutes after starting,
you will see all the chains_**
directories. You can keep listing the
files in any one of them every 5-10 seconds or so to watch for changing
file sizes. This is reassuring that everything is proceeding as it
should. For example, to do this with the base model for 2023 on the
server in a Bash
terminal: \
bash
cd /srv/hake/models/2023/01-version/01-base-models/01-base/mcmc
ll chain_01
Watch the chains
directories being removed. Each chain directory is
removed by adnuts
after it has completed and its output has been
copied into the mcmc
directory. The chains finish at different times.
You can keep checking the mcmc
directory to see them gradually all
disappear. Once they are all gone, the model run is nearly done. The
only remaining step is the -mceval
step which creates a
mcmc/sso/Report_mce_****.sso
file for each posterior. To see the
chain directories disappearing over time, do this every so often: \
bash
cd /srv/hake/models/2023/01-version/01-base-models/01-base/mcmc
ll
Watch adaptation.csv
and unbounded.csv
getting larger as the chain
information updates it. For example, run the following every 5-10
seconds to see the file size getting larger: \
bash
cd /srv/hake/models/2023/01-version/01-base-models/01-base/mcmc
ll adaptation.csv
Watch the process viewer to see each chain working, and watch the
processes disappear when the chains complete. Open the htop
viewer
(Linux) or whatever task manager/process viewer you have access to.
This example is for htop
: \
bash
htop
Once in htop
, you can single out your own processes by pressing u
followed by typing in your username (or a portion of it). When your
username is lit up on the left side, press Enter
and this will
filter the processes so only yours are shown. You can count the ones
which represent the R scripts called from the Bash
scripts. It should
be obvious which ones they are, as they will be using 100% (or close)
of the CPU power. You can see each of them disappear from this view when
its associated chain finishes.
Open the bash-scripts/run-forecasts.sh
file in an editor.
Make sure all path variables are correct. They must be the same as what
is in bash-scripts/run-base-model.sh
.
Open a Bash
terminal and change directories so you are in the
bash-scripts
directory, and execute the run script: ./run-forecasts.sh
.
This will take about an hour on Linux or Mac.
The parallelism for the catch-levels is implemented such that
each scenario gets its own processor. So there will be three processes
running that look the same (when using the htop
process viewer in
Linux).
doc/forecast-descriptions.csv
.Attach the catch-levels and forecasting output to the base model RDS
file. Open the script bash-scripts/create-rds-attach-forecasts.sh
and
make sure all the path variables match the base model location. Run the
script. Multiprocessor forking is implemented for loading so that each
forecast year gets its own processor. This code will overwrite the base
model RDS file. \
bash
cd bash-scripts
./create-rds-attach-forecasts.sh
Open the bash-scripts/run-retrospectives.sh
file in an editor.
Make sure all path variables are correct. They must be the same as what
is in bash-scripts/run-base-model.sh
.
Open a Bash
terminal and change directories so you are in the
bash-scripts
directory, and execute the run script
./run-retrospectives.sh
with N arguments representing the numbers of years
subtracted from the data for each retrospective model. For example, to run
4 retrospective models, each having 1, 2, 3, and 4 years of data removed,
run the following command: \
bash
./run-retrospectives 1 2 3 4
Each of those retrospective models will use num_chains
processors, so if
we leave the script as it is with 16 processors per model, we would be
using 64 CPUs with this call. We should not run more than this on our
80-CPU server.
This will take about 1hour and 45 minutes to complete on the server.
To run the next three we wait until almost all the processes from the
previous call finished (by looking at htop
), and then run this command: \
bash
$ ./run-retrospectives 5 6 7
Once those are near completion, we run the same command with 8, 9,
and 10 to finish off the retrospective models.
Create individual RDS files for each retrospective. This is done because
it takes a long time to load them all at one time, only to have a small
error in one of them which means starting all over. This way, only one RDS
file would have to be rebuilt. \
bash
cd bash-scripts
./create-rds-files-retro.R
This script creates all 10 RDS file in parallel, using the multicore
forking mechanism and takes about 20 minutes.
Attach the retrospectives output to the base model RDS file. Open the
script bash-scripts/create-rds-attach-retrospectives.sh
and make sure all
the path variables match the base model location. Run the script.
Multiprocessor forking is implemented for loading so that each
forecast year gets its own processor. This code will overwrite the base
model RDS file. \
bash
./create-rds-attach-retrospectives.sh
To run a new test model, we will assume you are using the base model as
a starting point and want to make some modifications to it. To do this,
create a new directory in the models
directory. On the server, this needs
to be done on the /srv
drive as it has a huge amount of space. The drive
on which the user accounts reside has almost no space, so please do not run
models there.
Make a new directory at: \
/srv/hake/models/2024/05-test-models/XX-new-model-name
\
Note that the year in the path will change depending on the assessment
year.
Copy the input files from the base model to the new directory using the
Bash
script: \
copy-base-model-input-files.sh
\
You should first edit the scripts and make sure both the
base_model_path
and test_model_path
variables are properly set.
This script will copy the 5 files listed in the files
list from the
base model directory to the test model directory. You must create the
directory first by running mkdir your_test_model_dirname
.
Make a copy of the script run-base-model.sh
called run-test-model.sh
or whatever you want to call it. Edit that file, and change the following
variables and save: \
bash
type_path="05-test-models"
model_name="XX-your-test-model-name"
Change any of the following variables in the script as you need to: \
bash
ss_exe="ss3_2024"
num_chains=8
num_samples=8000
num_warmup_samples=250
adapt_delta=0.95
run_extra_mcmc=TRUE
If you change the num_samples
to 1000 and the adapt_delta
to 0.4 the
model will run much faster (for example for testing).
Enter the directory for the new model in the Bash
terminal and run a
simple MLE to make sure the model runs in that mode before starting
a full adnuts MCMC run: \
bash
ss3_2024
This should end elegantly, without error. If it runs, clean the directory
and then go ahead and run the script that runs the adnuts MCMC: \
bash
clean_ss3
./run-test-model.sh
(Or whatever you called the file earlier)
Open the two scripts bash-scripts/run-bridge-models.sh
and
bash-scripts/generic-run-models.sh
in an editor.
Make sure all bridge model directory names are correct (models
list in
bash-scripts/run-bridge-models.sh
).
Open a Bash
terminal and change directories so you are in the
bash-scripts
directory, and execute the script: \
bash
./run-bridge-models.sh
The models will be run in parallel. Keep in mind that $num_chains
times
the number of models in $models
is the number of CPUs required for this
script. For the server. it should not be more than 64, or 4 models at a time
with num_chains=16
. $num_chains
is defined in
bash-scripts/generic-run-models.sh
. If you need to run 2 or more sets of
bridge models to complete them all due to the CPU limitations, comment some
out in bash-scripts/run-bridge-models.sh
. For example, thew following
shows that the first three models will be run and the last three will not.
In Bash
, lists like this have a space between the elements, and the
backslash allows you to break a line up and continue it on the next line.
So the space is required before the backslash. \
bash
models=(01-updated-ss-exe \
02-add-new-catch \
03-add-new-weight-at-age)
# 04-add-survey-age-2-plus \
# 05-add-survey-age-1 \
# 06-add-fishery-ages)
Note that keep_index_fit_posts
is a boolean flag for whether to include
the survey index fit posteriors. These take up a lot of space and are only
necessary for one plot, which so far is only shown for the base model.
Create the RDS files for each bridging model with the
bash-scripts/create-rds-bridge.sh
. This script contains the same models
list
as in the script that runs the bridge models. Make sure the models
list
is in order and then run the script: \
bash
./create-rds-bridge.sh
To run the sensitivity models follow the same procedure as for the
bridging models. There are currently 4 scripts to do this because there
are 16 sensitivity models and it was tiring commenting and uncommenting
them all, so the 4 scripts are exactly the same except for the models
that are included. These scripts are to be run one at a time. They are
named as run-sensitivity-models-*.sh
with *
being 1 through 4.
Create the RDS files for each sensitivity model with the
bash-scripts/create-rds-sensitivities.sh
script. This is the same
process as for bridge models above.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.