In HaikunXu/FishFreqTree: IATTC's regression tree R package for analyzing size frequency data

Introduction

This R package FishFreqTree helps users to easily explore and quantitatively compare fishery definitions based on a distributional regression tree algorithm that is applied to age/length frequency data. The details regarding distributional regression tree algorithm can be found in @lennert-cody2010 and @lennert-cody2013. Maunder et al. 2022 provides an example that demonstrates how to use this package to make fishery definitions for stock assessments.

How to install

library(devtools)

install_github('HaikunXu/FishFreqTree',ref='main')

Input data

The input age/length frequency data should be a data frame including four columns named exactly as "lat", "lon", "year", and "quarter" and various length frequency columns corresponding to selected age/length bins. The columns "lat" and "lon" represent the latitudinal and longitudinal positions of grid centers, respectively. For those who don't consider quarter as a splitting dimension (e.g., your model has a time step of one year), please still add a column named "quarter" to the input data with value = 1. This regression tree package works with age/length frequency data so please make sure age/length frequency values sum to 1 across age/length bins. An example of the input data can be found [here]{.underline}.

Functions

Main functions

`run_regression_tree`: explore a user-specified fishery definition based on the regression tree algorithm

Users are required to specify the input data frame (LF), the first (fcol) and last (lcol) columns in the input data frame that have frequency data, the name of all age/length bins (bins) as a numeric vector, the number of splits (Nsplit; equals to the number of defined fisheries - 1), and a directory where results are saved (save_dir).

The function also provides some advanced options including manually building the regression tree (manual = TRUE) and specifying the minimal number of lat (lat.min), lon (lon.min), and year (year.min) allowed for a cell. Users can also turn on/off the year (year) and quarter (quarter) dimensions when building the regression tree.

This function provides a series of standardized outputs for users to understand the result:

split.csv: all candidate splits are compared and sorted across existing cells based on the percentage of total variance explained. The last column of this table (Rank) is used to specify the step-wise selection decision.
improvement-split.csv: cell-specific improvement metric for all candidate splits; values are sorted for every cell (highest values are preferred for the selection). This table provides supplementary information only and is NOT used to specify the step-wise selection decision.
Record.csv: summarize step-wise split information including split number, key, value, cell, and the percentage of total variance explained.
split(annual maps).png: spatial distribution of cells across quarters
split(quarterly maps).png: spatial distribution of cells by quarter
split(latlon).png: cell-specific improvement profiles against lat and lon
split(year).png: cell-specific improvement profiles against year
split(lf).png: comparison of cell-specific length frequency

The package provides a default fishery definition (run_regression_tree(..., manual = FALSE)) by selecting every split that corresponds to the highest percentage of variability explained (the first row of split.csv files). However, users can explore other definitions by using run_regression_tree(..., manual = TRUE, select = user_specified). The user-specified splits are numbered according to the rank in split.csv files). Please change one split at a time because the step-wise regression tree is hierarchical.

`loop_regression_tree`: compare differing fishery definitions according to the percentage of variance explained

Users are highly recommended to compare various fishery definitions, even for the same number of splits, because the definition is flexible and may need to be adjusted for practical reasons. Moreover, the tree is hierarchical and unstable, so comparing a variety of combinations with the default combination is highly valuable. In fact, the default one may not explain the highest percentage of variance in the input data.

Users are required to specify the input data frame (LF), the first (fcol) and last (lcol) columns in the input data frame that have frequency data, the name of all age/length bins (bins) as a numeric vector, the number of splits (Nsplit; equals to the number of defined fisheries - 1), a directory where results are saved (save_dir), and the selection matrix the user wants to explore (select_matrix).

The function also provides some advanced options including specifying the minimal number of lat (lat.min), lon (lon.min), and year (year.min) allowed for a cell. Users can also turn on/off the year (year) and quarter (quarter) dimensions when building the regression tree.

For example, there are three splits (Nsplit = 3) and you want to explore two competing splits for each of the three splits. First you need to generate the selection matrix by select_matrix <- expand.grid(split1 = 1:2, split2 = 1:2, split3 = 1:2). The number of combinations to be explored should be 2*2*2=8: dims(select_matrix).

The function provides to the screen the summary table from the loop function (also saved as loop.csv).

| select1 | select2 | select3 | \% var_explained | |---------|---------|---------|------------------| | 1 | 2 | 1 | 0.1642 | | 1 | 1 | 1 | 0.1617 | | 1 | 1 | 2 | 0.1556 | | 1 | 2 | 2 | 0.1499 | | 2 | 1 | 1 | 0.1453 | | 2 | 2 | 1 | 0.1402 | | 2 | 1 | 2 | 0.1394 | | 2 | 2 | 2 | 0.0953 |

`evaluate_regression_tree`: evaluate a user's pre-specified fishery definition according to the percentage of variance explained

Users can evaluate a pre-specified fishery definition using this function. Users are required to specify the input data frame (LF), the first (fcol) and last (lcol) columns in the input data frame that have frequency data, the name of the column where fishery definition is specified (Flagcol), the name of all age/length bins (bins) as a numeric vector.

This function provides a series of standardized outputs for users to understand the result:

split(annual maps).png: spatial distribution of cells across quarters
split(lf).png: comparison of cell-specific length frequency

Supporting functions

`make.lf.map`: make lat-lon gridded maps for length frequency

`make_meanl.map`: make spatial maps for mean length

`lf.aggregate`: aggregate length count data by length only or also by year and quarter

`lf.demean`: remove the mean of length frequency data

Package demonstration

Load the required packages:

# devtools::install_github('HaikunXu/RegressionTree',ref='main')
library(FishFreqTree)
require(tidyverse) # it is required for some supporting functions
# check the example LF data
head(LF)

Specify function inputs

fcol <- 5 # the first column with LF info
lcol <- 17 # the last column with LF info
bins <- seq(30,150,10) # length of bins as a numeric vector
Nsplit <- 3 # the number of splits (the number of cells - 1)
# the directory where results will be saved
save_dir <- "D:/OneDrive - IATTC/Git/FishFreqTree/manual/"

Plot the length frequency data as lat-lon grids and map of mean length

# plot lf data as maps
make.lf.map(LF, fcol, lcol, bins, save_dir)
# plot mean length as maps
make.meanl.map(LF, fcol, lcol, bins, save_dir)

Find the default 4-cell combination:

LF_Tree <- run_regression_tree(LF, fcol, lcol, bins, Nsplit, save_dir)

Check the default 4-cell combination results

head(LF_Tree$LF)
# a summary of the three splits
LF_Tree$Record
# Color-code the 4 fisheries in the map
make.split.map(LF_Tree$LF, Nsplit, save_dir)
# or text-code the 4 fisheries in the map (works better when quarterly splits exist)
make.split.map(LF_Tree$LF, Nsplit, save_dir, text = TRUE)

In addition to the default combination, you can also manually explore other 4-cell combinations. For example, in the figure above cell#2 only includes two 5*5 grids, so you want to explore some other 4-cell combinations. You can run the regression tree again with the second best 2th split:

LF_Tree2 <-
  run_regression_tree(LF,
                      fcol,
                      lcol,
                      bins,
                      Nsplit,
                      save_dir,
                      manual = TRUE,
                      select = c(1, 2, 1))
# a summary of the three splits
LF_Tree2$Record
# map the 4 cells
make.split.map(LF_Tree2$LF, Nsplit, save_dir)

This setting leads to a even higher proportion of variance explained (16.4% vs. 16.2%) than the default run. Also, the number of 5*5 grids within each cell is more reasonable. In fact, you can use the loop function to explore multiple 4-cell combinations. Those combinations are compared and sorted according to the proportion of variance explained.

# create a directory to save results from the loop function
loop_dir <- paste0(save_dir,"loop/")
dir.create(loop_dir)
# build the selection matrix
my_select_matrix <-
  data.matrix(expand.grid(
    split1 = 1:2,
    split2 = 1:2,
    split3 = 1:2
))

# check the selction matrix
my_select_matrix

LF_Tree_Loop <-
  loop_regression_tree(LF,
                       fcol,
                       lcol,
                       bins,
                       Nsplit,
                       save_dir = loop_dir,
                       select_matrix = my_select_matrix)

Users can also pre-specify a fishery definition and evaluate the performance of the pre-specified splits according to the percentage of variance explained:

# first specify the fishery definition using column Flag3
LF_Tree3 <- LF_Tree$LF
LF_Tree3$Flag3 <- ifelse(LF_Tree3$Flag3 > 2, LF_Tree3$Flag3, 1)
# check the fishery deifnition is specified as expected
make.split.map(LF_Tree3, 3, save_dir)
# evlaute the fishery definition
evaluate_regression_tree(LF_Tree3,fcol,lcol,"Flag3",bins,save_dir,folder_name="test")

References

HaikunXu/FishFreqTree documentation built on Jan. 25, 2025, 12:07 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

HaikunXu/FishFreqTree
IATTC's regression tree R package for analyzing size frequency data

In HaikunXu/FishFreqTree: IATTC's regression tree R package for analyzing size frequency data

Introduction

How to install

Input data

Functions

Main functions

`run_regression_tree`: explore a user-specified fishery definition based on the regression tree algorithm

`loop_regression_tree`: compare differing fishery definitions according to the percentage of variance explained

`evaluate_regression_tree`: evaluate a user's pre-specified fishery definition according to the percentage of variance explained

Supporting functions

`make.lf.map`: make lat-lon gridded maps for length frequency

`make_meanl.map`: make spatial maps for mean length

`lf.aggregate`: aggregate length count data by length only or also by year and quarter

`lf.demean`: remove the mean of length frequency data

Package demonstration

References

R Package Documentation

Browse R Packages

We want your feedback!

HaikunXu/FishFreqTree IATTC's regression tree R package for analyzing size frequency data

In HaikunXu/FishFreqTree: IATTC's regression tree R package for analyzing size frequency data

Introduction

How to install

Input data

Functions

Main functions

run_regression_tree: explore a user-specified fishery definition based on the regression tree algorithm

loop_regression_tree: compare differing fishery definitions according to the percentage of variance explained

evaluate_regression_tree: evaluate a user's pre-specified fishery definition according to the percentage of variance explained

Supporting functions

make.lf.map: make lat-lon gridded maps for length frequency

make_meanl.map: make spatial maps for mean length

lf.aggregate: aggregate length count data by length only or also by year and quarter

lf.demean: remove the mean of length frequency data

Package demonstration

References

R Package Documentation

Browse R Packages

We want your feedback!

HaikunXu/FishFreqTree
IATTC's regression tree R package for analyzing size frequency data

`run_regression_tree`: explore a user-specified fishery definition based on the regression tree algorithm

`loop_regression_tree`: compare differing fishery definitions according to the percentage of variance explained

`evaluate_regression_tree`: evaluate a user's pre-specified fishery definition according to the percentage of variance explained

`make.lf.map`: make lat-lon gridded maps for length frequency

`make_meanl.map`: make spatial maps for mean length

`lf.aggregate`: aggregate length count data by length only or also by year and quarter

`lf.demean`: remove the mean of length frequency data