In pbs-assess/gfsynopsis: Data Synopsis Reports for PBS Groundfish

library(knitr)
library(here)
opts_chunk$set(echo = FALSE, message = FALSE, warnings = FALSE)

\clearpage

Abstract {-}

Modern survey and fishery observation programs generate vast quantities of data. However, regulatory agencies often lack the capacity to translate that data into effective monitoring information. Here, we describe an approach of automated visualization of raw fisheries data and demonstrate it with a report on 113 groundfish species in Pacific Canadian waters. Our implementation consists of two pages per species that show standardized visualizations of temporal trends and spatial distributions of commercial catches and survey indices, along with analyses of age, size, maturity, and growth. The approach facilitates discussions on stock assessment and survey prioritization, increases transparency about data holdings, and makes data available for regular review by interested parties. We encourage other agencies to consider similar approaches for their own data.

Introduction {-}

Survey and fishery observation programs produce large quantities of biological and environmental data. However, translating those data into effective monitoring information that is useful for decision-making remains a challenge. Agencies that collect data often lack the resources to report on all facets of their data in a timely fashion. Typically, detailed stock assessments report on data for only a small subset of species. While such focused analyses are required for management of some stocks, broad monitoring programs across an array of species and data types are essential for detecting ecological surprises and "unknown unknowns" [@hilborn1987; @wintle2010] as well as understanding species trends in a timely fashion. As part of a movement toward open data [e.g., @obama2013; @canada2018], many raw datasets, including those collected by government agencies, are now made available. However, processing and visualizing those datasets requires domain-specific knowledge (e.g., about sampling protocols), advanced skills (e.g., data management, population modelling, computer programming), and time. Collectively, these factors often limit the ability for interested parties to derive meaningful insights.

Automated visualization and reporting is one solution to these problems. Data visualization can serve as much more than just an endpoint of data analysis---it can serve as a critical component of the scientific process [@fox2011]. Visualization can facilitate rapid understanding of data, engage stakeholders, and raise awareness of data gaps. Visualization can also serve as a tool for those with domain-specific knowledge to spot errors in data sets, or identify data gaps where surveys or other data collection could be planned to gather necessary data. Furthermore, compared to the structured nature of statistical models, the unconstrained nature of visualization helps detect the unexpected [@wickham2015]. The advancement of data-analysis and dynamic-report-generation packages for the statistical software R [@r2019], such as knitr [@xie2017], makes it increasingly practical to generate reproducible visualization reports of survey and fishery monitoring data.

A number of governmental agencies have developed visualization reports or websites featuring fisheries data. For example, @ices2019 maintains a website visualizing the data in their ICES Stock Assessment Database from 2014 onwards (http://standardgraphs.ices.dk). The site includes plots of catches, recruitment, fishing mortality, spawning stock biomass, and status with respect to reference points for stocks. The Australian government agency maintains the "Status of Australian Fish Stocks Reports", which have visualized recent catches, stock status, and reported distribution of 120 species since 2012 (https://www.fish.gov.au). In the United States, where species are assessed and reviewed by eight regional fisheries management councils within the National Marine Fisheries Service (NMFS), no publicly available common set of data visualizations exist. However, a variety of graphics are published in stock assessments, ecosystem indicators are visualized in Regional Ecosystem Status Reports (https://www.integratedecosystemassessment.noaa.gov), and graphical tools exist for specific purposes such as visualizing groundfish distributions in Alaska [@barbeaux2018]. In Canada, examples include @ricard2013---a report summarizing Fisheries and Oceans Canada (DFO) survey data for marine fish and invertebrates on the East Coast Scotian Shelf and Bay of Fundy.

Collectively, data visualization reports and websites, such as those mentioned above, are important syntheses of information and communication tools. However, these visualizations primarily report on stock status and assessment outputs [but see @ricard2013]. No agency, to our knowledge, has a one-stop location to view all available raw fisheries dependent and independent data for all major stocks in a common format that encourages rapid comparison amongst stocks. Furthermore, most fish stocks worldwide lack a formal assessment, but some quantity of raw data often exists. Even where assessments do exist, we contend there remains value in standardized visualizations of raw data across multiple stocks and visualizing data between assessment cycles. There also remains value in collecting visualizations, which would ordinarily be scattered within hundreds of pages of individual stock assessments, so that they can be viewed simultaneously and compared. Such an approach facilitates new research ideas---such as correlations that might exist between population trends or relationships between fishery trends before and after management changes---and addresses questions such as: What raw data are available for various stocks? Given the data available, which analyses could be used to assess various stocks? What data-gaps need to be filled to conduct formal assessments of new stocks, or improve on previous ones? Are there warning signs in the raw data that might trigger an in-depth analysis? What is the one place someone can go to see all the available fisheries dependent, fisheries independent, growth, and maturity data visualized for all stocks in a given region for any number of other uses (e.g., building ecosystem models)?

To address this gap, here we make the case for large-scale, consistent-format visualization of raw fisheries data. We demonstrate the concept through a report focussing on 113 groundfish species on the west coast of Canada [@anderson2019synopsis]. In this region, the commercial groundfish fleet is subject to 100% on-board and dockside monitoring, while numerous fisheries-independent trawl and longline surveys are conducted annually, producing large quantities of valuable data [@stanley2015]. At the same time, assessment scientists face limitations in personnel and time. As a result, most stocks are not regularly examined through stock assessment, leaving most monitoring data unreported. Although the Canadian government is engaged in an open-data initiative [@canada2018], some of DFO's commercial data holdings cannot be shared in raw format due to privacy laws. Furthermore, data that are shared are not easily digested without advanced data processing skills and extensive knowledge of the survey programs and commercial fisheries.

Our goals with the visualization report were manifold and included: (1) ensuring that the current state of knowledge on indices of abundance for all groundfish stocks are available for regular review by DFO scientists, managers, and all interested parties; (2) facilitating discussions related to stock assessment and survey-program prioritization; (3) increasing transparency about our data holdings; and (4) producing standardized tools, data-set derivation, and visualizations that are useful elsewhere (e.g., in stock assessments).

Methods {-}

We aimed to make the report automated and reproducible so that it could be efficiently updated on a regular basis. We ensured report production was transparent with regards to how the data were extracted from databases and treated prior to plotting, and developed the code and report in the open and under version control (https://github.com/pbs-assess/gfdata, https://github.com/pbs-assess/gfplot, https://github.com/pbs-assess/gfsynopsis). Often, raw data cannot be plotted in a meaningful way without first undergoing some data tidying [@wickham2014] and/or mathematical modelling, and for this we aimed to use consistent methods across species. To accomplish these aims, we developed a family of R packages that facilitated extraction of data from relational databases (package "gfdata"), tidying and arranging the necessary data for modelling or plotting, fitting statistical models, plotting the data and statistical models (package "gfplot"), and compiling the visualizations into a report format (package "gfsynopsis"). We wrote the packages in a modular format (Fig. 1) so that individual functions could be used for other purposes---e.g., so data could be extracted and biological models fit for stock assessments in a consistent, standardized way.

In designing the report, we made the following design considerations and incorporated them throughout construction of the report:

Simplify understanding of the collection of plots for each species by maintaining a consistent layout for all species (including empty plots to emphasize a lack of data for some species);
Enable rapid viewing of all data for an individual species at once by limiting the report to two pages per species;
Maximize information that could be presented in the limited space by focusing on data types relevant to the most species,
Organize data to facilitate clear and simple interpretation by grouping similarly focused visualizations together;
Present contextual information essential to interpretation by displaying metadata alongside visualizations where possible;
Enable simple differentiation among data sources by maintaining consistent colour meaning throughout (e.g., data from a given survey always uses the same colour) and use colourblind-proof colours where possible;
Leave species pages entirely visual by describing the visualizations in detail at the beginning of the report;
Minimize misinterpretation by highlighting caveats and uncertainty wherever possible (e.g., through shading to indicate periods of increased uncertainty in reported catches or indicating sample sizes associated with age and length frequencies); and
Avoid false interpretation by omitting visualizations where sample sizes were clearly too low to draw meaningful inference (e.g., omitting von Bertalanffy growth model fits if < 20 fish were sampled or omitting spatial density maps if < 5% of survey sets caught the species).

We focused our analysis on species of commercial, recreational, conservation, or First Nations interest, as well as those that are regularly caught in DFO surveys. We included population trends and biological data from both commercial fisheries and scientific surveys to give an overall snapshot of available information for each species. Of the fisheries-independent surveys conducted by DFO, we presented data from those that provided the greatest spatial and taxonomic coverage of the species. We chose to focus on aspects of the data that were most important to stock assessment, such as indices of abundance, catch and composition data, and growth and maturity analyses. We displayed both spatial and temporal aspects of the data.

We settled on presenting visualizations of: (1) temporal trends and spatial distributions of fisheries-independent survey data and commercial fisheries catch and catch per unit effort (CPUE); (2) temporal trends in length and age frequencies; and (3) biological quantities including length at age, weight at length, age at maturity, length at maturity, and maturity frequencies from fisheries-independent surveys and commercial data sources. We additionally presented a summary of counts of available biological data (ages, ageing structures such as otoliths, lengths, weights, and maturities) for all commercial and research samples. Throughout report development we solicited feedback from stakeholders and other potential users of the report on which data to include as well as on report structure and design. We also sought feedback from data experts for various species groups to ensure that data being captured were accurate, complete, and presented faithfully.

Results {-}

Our implementation included two pages of visualization for 113 species of British Columbia (BC) groundfish and sharks [@anderson2019synopsis, Figs 2--4]. Figures 2--4 are included here as examples to demonstrate our visualizations; see @anderson2019synopsis for full descriptions and interpretations of the data. The species covered in the report included: 34 rockfish species; 16 elasmobranch and one chimaera species; 15 flatfish species; nine gadiform species (cods); 16 sculpin and poacher species; three greenling/lingcod species; six eelpout, three prickleback, and seven other perciform species; and one species each of smelt, sablefish and snailfish. The visualizations for many data-limited species included mostly empty plots, which graphically emphasized the lack of data (Fig.\ 4, last row)---effectively turning unknown unknowns about a given species [@wintle2010] into known unknowns. For data-rich species, nearly all available data were visualized together on two pages for the first time. Overall, the approach has had many benefits with respect to transparency, assessment efficiency, stakeholder engagement, and monitoring (Table \@ref(tab:benefits)) and has been well-received by stakeholders, managers, and other fisheries scientists [@dfo2019synopsis].

Discussion {-}

We have described an automated large-scale visualization approach that we have implemented as a report presenting up-to-date population trends and biological data from commercial fisheries and research surveys. Additionally, the visualizations can be rapidly updated with new data in between report publications for specific purposes. Stock assessments are not available for most species included in our report, and are not available every year for most species that are assessed, which has limited the dissemination of these data in the past. Our approach facilitates the broad monitoring of groundfish populations---it ensures groundfish scientists, managers, and interested parties have access to consistent, up-to-date, and easily digestible visualizations of data holdings. The approach increases transparency, provides a mechanism for regular review of data by relevant parties, allows for timely identification of changes in trends, and facilitates discussions concerning stock assessment and survey-program prioritization.

We contend it is the combination of public availability, large-scale consistent-format visualizations, rapid data-to-document reproducibility, and a focus on raw data that makes our approach novel and useful. Undoubtedly, many agencies worldwide have internal tools (e.g., R packages) to extract data from databases and visualize it. While these tools are important and useful, they have limited utility for stakeholders and scientists outside of an agency. Even within an agency, having a single go-to report will increase speed and efficiency since these tools require time and expertise to use, and effort is duplicated when several people independently extract data on the same species. In terms of format, the individual plots that make up our two-page species summaries are based on standard visualizations often seen scattered throughout stock assessments. However, we contend this is a case where "the whole is greater than the sum of its parts". Having such a variety of plot types in close proximity [a high data-to-ink ratio; @tufte2001] enables comparison and a rapid understanding of any assessed or unassessed stock in a way that is not otherwise possible. As examples, in our case, up to ten survey indices can be compared amongst themselves and with commercial CPUE; length and age composition patterns are plotted for all surveys together; and spatial and temporal survey patterns can be considered simultaneously.

Internally, the approach we describe here provides many features and resulting benefits that would extend to any agency taking a similar approach (Table \@ref(tab:benefits)). Specifically within our region, as one example, the report has become an indispensable tool for developing operating models when implementing a management procedure approach to fisheries management [e.g., @butterworth1999] for data-limited groundfish stocks. The report gives analysts a quick overview of available data types and quantities and, therefore, what management procedures are possible for a given stock. Second, the approach has helped analysts identify numerous anomalies in our databases. For example, we noticed that 2009 was consistently a low index year in one of our longline surveys and discovered that the survey design was not followed precisely that year. This information had been lost to current stock assessment scientists. Third, the ease with which we can skim survey indices and composition data has enabled us to detect notable changes quickly. For instance, immediately after the 2019 survey data were entered, we regenerated the report and skimmed survey indices for any major changes. We noted a rapid decline in some Arrowtooth Flounder survey indices and early signs of recovery of Bocaccio rockfish in survey indices and age compositions. These types of issues have played into discussions of assessment prioritization and work planning.

It is important that the approach we describe is automated from data extraction to report generation because the product is most useful with up-to-date data and it would be an enormous undertaking to repeatedly recreate such a report from scratch. This is considerably easier with carefully organized and maintained relational databases. This report and its data-to-document workflow demonstrate the value of investing in the development and ongoing maintenance of organized databases for any agency collecting and housing large quantities of data [e.g., @michener2012]. The functions we created to extract groundfish data for the report are available within our organization for those requiring access to the databases, and facilitate internally reviewed, standardized, and consistent data extraction methods. Importantly, the functions allow us to work efficiently while living with the legacy of multiple historical databases without the need to create a single monolithic database.

Unlike detailed species-specific analyses conducted in stock assessments, we liken the mass visualization of raw fisheries data to window shopping. The approach gives a quick flavour of the available data for a given stock and hints at possible indices of relative population trends. However, the outputs are not a substitute for species-specific stock assessment [or graphical syntheses of stock assessment output; e.g., @ices2019]. For example, stock assessments may use different data sources that are most applicable for specific species, analyze data on a finer spatial basis, or make considerations for special cases that are not included in our report. Furthermore, relative biomass indices based on survey or commercial CPUE data in the report require careful, species-specific interpretation as they may not best represent abundance trends for some stocks. For example, biomass indices may be based on surveys with particularly low catchability for certain species, or commercial CPUE may be influenced by management changes.

We suggest that an approach such as we have presented would have broad utility across taxonomic groups, regions, and types of data. A similar approach could be useful nearly anywhere agencies monitor multiple ecosystem elements (e.g., populations, species, and/or environmental processes) with relevant data maintained in databases. The approach would be especially useful for agencies that lack the capacity to assess those elements in great depth at a frequency that meets management needs. However, even with regular detailed assessments, such an approach can provide a broad overview in a format consistent across all elements. We hope our approach can serve as a template for other regions and species groups---as one of many tools---to turn survey and fisheries data into effective broad monitoring.

Acknowledgements {-}

We are grateful for the ongoing collaboration between DFO, commercial fishers, First Nations, and non-governmental organizations, which has made possible the collection of the valuable data that underlies the report described in this paper. For invaluable input into the report's design we thank Robyn Forrest, Chris Grandin, Dana Haggarty, Rob Kronlund, Daniel Ricard (who suggested the window-shopping metaphor), Chris Rooper, and Greg Workman. We thank Norm Olsen, Maria Surry, and Malcolm Wyeth for maintaining the BC groundfish databases and for help with data extraction. We thank Robyn Forrest, Rob Kronlund, James Ianelli, an anonymous reviewer, and the editor (Kristen Anstead) for helpful comments on earlier versions of this manuscript.

References {-}

\clearpage

Tables {-}

\clearpage

library(magrittr)
dt <- readr::read_csv(here::here("paper", "benefits-table.csv")) %>%
  dplyr::filter(!is.na(Example)) %>%
  dplyr::mutate(Benefit = ifelse(is.na(Benefit), "", Benefit)) %>%
  dplyr::mutate(Example = ifelse(is.na(Example), "", Example)) %>%
  dplyr::mutate(Example = gsub("\\.$", "", Example)) %>%
  dplyr::mutate(Benefit = gsub("\\.$", "", Benefit))
knitr::kable(
  dt,
  col.names = c("Feature", "Example benefits"),
  booktabs = TRUE,
  linesep = "\\addlinespace",
  caption = "Features and examples of their benefits for large-scale consistent-format visualization of raw fisheries data."
) %>%
  kableExtra::column_spec(1, width = "6.0cm") %>%
  kableExtra::column_spec(2, width = "10cm")

\clearpage

Figure captions {-}

Figure 1: Data-to-document workflow for generating the data synopsis report. (A) The automated workflow from data extraction to report production. (B) Example modular functions built for this report. Shown are functions for extracting biological data from commercial and fishery-independent surveys, which are then passed to functions for tidying and arranging the data or fitting statistical models to the data prior to generating visualizations. The functions are modular so that a data-extraction function can be paired with multiple data-tidying functions, the data-extraction and data-tidying functions can be used for other purposes (e.g., in stock assessments), and the plotting functions can be used with different underlying data sources.

Figure 2: Example first page for Silvergray Rockfish \emph{Sebastes brevispinis} featuring metadata and temporal and spatial trends from fisheries-independent surveys and commercial catch and effort data in British Columbia, Canada. (Top row) Relative biomass indices from up to ten surveys, commercial catch in each geographical region (panels) and gear type (colour), and standardized (solid line) and unstandardized (dashed line) commercial bottom catch per unit effort (CPUE). (Bottom row) Modelled relative biomass in space from four synoptic trawl surveys (left), two hard bottom long line surveys (second from left), and the International Pacific Halibut Commission (IPHC) setline survey (third from left); commercial trawl (second from right) and hook and line (right; H \& L) CPUE.

Figure 3: Example second page for Silvergray Rockfish \emph{Sebastes brevispinis} featuring biological sample, maturity, and growth data in British Columbia, Canada. (Top left) Length frequencies by year and survey for females (coloured) and male (grey). (Top right) Length-age von-Bertalanffy and weight-length model fits. (Second row) Age frequencies by year and survey. (Third row) Age-at-maturity and length-at-maturity model fits and frequency of maturity stage by month. (Bottom row) Counts of available fish lengths, weights, maturities, ages, and ageing structures from all available survey and commercial samples.

Figure 4: A consistent layout across species enables rapid comprehension of data, comparison across species, and highlights missing data (blank panels). Some missing data are due to a lack of sampling or aging for a given species and others are because a given species is more susceptible to trawl or longline survey gear. Shown here from top to bottom are three species in decreasing order of data richness: Canary Rockfish \emph{Sebastes pinniger}, Petrale Sole \emph{Eopsetta jordani}, and Sand Sole \emph{Psettichthys melanostictus}.

pbs-assess/gfsynopsis documentation built on March 26, 2024, 7:30 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com