In einarhjorleifsson/datrasdoodle: Does not matter.

Haul data {#haul}

Preamble

It assumed that you have already downloaded, tidied and saved some DATRAS data on your computer (see #import).

It is recommended that every session in R is started by declaring/loading all the libraries needed to run the session:

library(tidyverse)
library(lubridate)
library(mapdata)
# devtools::install_github("einarhjorleifsson/gisland", dependencies = FALSE)
library(gisland)

Besides loading up the needed libraries it is recommended that the data used is imported upfront. Here we will import the tidy NS-IBTS downloaded and tidied in the Introduction, this time around we only need the haul data:

hh <- 
  read_rds("data/ns-ibts_hh.rds") %>% 
  # TODO: Move this upstream, into the tidy_hh function
  #       Should set code up so that it works separately on character and
  #       numerical variables
  mutate_all(funs(ifelse(.==-9, NA_real_, .))) %>% 
  mutate_all(funs(ifelse(.=="-9", NA_character_, .)))

Spatial overview

Like most fisheries data survey data are spatial in nature. So it is only natural to start to attempt to get a spatial overview of the survey coverage. As long as we do not need to do some heavy computational calculations (nice introduction to that is Geocomputation with R) the ggplot2 framework provides some helpful remedies (read: functions) when dealing with spatial plotting in R. Here we just jump into the deep end, but a little gentler introduction can e.g. be found Spatial plotting data with ggplot2.

Since in recent years at least both the start and end of the hauls are reported we can use the geom_segment-function to plot the haul track. The following scripts provides an overview of the data for one year by quarter and vessel:

# map of terrestrial area
xlim <- c(-4, 13)
ylim <- c(49, 62)
m <- map_data("worldHires", xlim = xlim, ylim = ylim)
# data from 2017 and only valid hauls
hh %>% 
  filter(year == 2017,
         haulval == "V") %>% 
  ggplot() +
  geom_polygon(data = m, aes(long, lat, group = group), fill = "grey") +
  geom_segment(aes(x = shootlong, y = shootlat, xend = haullong, yend = haullat,
                   colour = ship), lwd = 1) +
  coord_quickmap(xlim = xlim, ylim = ylim, expand = FALSE) +
  facet_wrap(~ quarter) +
  labs(x = NULL, y = NULL,
       title = "NS-IBTS 2017",
       subtitle = "Haul location by quarter and by vessel",
       colour = "Vessel")

Gridding data

One is often interested in getting an overview of the data summaries based on some grids (e.g. ICES statistical rectangles). Gridding is in effect the same thing as binning data when one generates a histogram except that it is two dimensional - and then the nomenclature used is often "rasterization". For spatial data it is both the longitude and the latitude that is binned (for further background see: On gridding spatial data).

For gridding we will use here a "homemade" function grade that resides in the gisland-package (TODO: make link where installing the packages is illustrated). And here we will simply grid by using the ICES rectangle resolution (0.5 x 1.0 degrees) and then count the number of stations in each rectangle. Once done one can use the geom_raster-function to generate a spatial visualization:

d <-
  hh %>% 
  filter(haulval == "V",
         year == 2017) %>% 
  mutate(lon = grade(shootlong, 1),
         lat = grade(shootlat, 0.5)) %>% 
  group_by(lon, lat, quarter) %>% 
  count()
d %>% 
  ggplot() +
  geom_raster(aes(lon, lat, fill = n)) +
  geom_polygon(data = m, aes(long, lat, group = group), fill = "grey") +
  coord_quickmap(xlim = xlim, ylim = ylim, expand = FALSE) +
  facet_wrap(~ quarter) +
  scale_fill_viridis_c(option = "B", direction = -1, breaks = c(1:5), guide = "legend") +
  labs(x = NULL, y = NULL, fill = "No. stations",
       title = "NS-IBTS 2017",
       subtitle = "Number of hauls by ICES rectangles")

Figure \@ref(fig:haul-plot2) is an example

Note, that in contrast with the previous script we now "called" the terrestrial layer after the data-layer. Hence the squares that partially cover the water are masked with the shorelines (try revering the layer call in the script above).

# NOT SHOWN
# If you are interested in making a little prettier plot try:
# A north see plot template
p_ns <- 
  ggplot() +
  theme_bw() +
  scale_x_continuous(name = NULL, minor_breaks = NULL, breaks = seq(-4, 13, by = 1)) +
  scale_y_continuous(name = NULL, breaks = seq(49, 65, by = 1)) +
  coord_quickmap(xlim = xlim, ylim = ylim, expand = FALSE)
p_ns
# Add the current data
p_ns +
  geom_raster(data = d, aes(lon, lat, fill = n)) +
  # we add terrestrial ara as an overlay, to mask partial ICES rectangles
  #    "over" the rectangles
  geom_polygon(data = m, aes(long, lat, group = group), fill = "grey") +
  scale_fill_viridis_c(option = "B", direction = -1, breaks = c(1:5), guide = "legend") +
  facet_wrap(~ quarter) +
  labs(colour = "No. stations",
       title = "NS-IBTS 2017")

One can easily use the above code to obtain similar information using other DATRAS surveys. Or for that matter all the surveys:

d <- 
  read_rds("data/hh_datras.rds") %>% 
  filter(haulval == "V",
         year == 2017) %>% 
  mutate(lon = grade(shootlong, 1),
         lat = grade(shootlat, 0.5))
d %>% 
  group_by(lon, lat) %>% 
  count() %>% 
  mutate(n = ifelse(n >= 10, "10+", paste0(" ", n))) %>% 
  ggplot() +
  geom_raster(aes(lon, lat, fill = n)) +
  geom_polygon(data = map_data("world"), aes(long, lat, group = group), fill = "grey") +
  scale_fill_viridis_d(option = "B", direction = -1) +
  coord_quickmap(xlim = range(d$lon), ylim = range(d$lat)) +
  labs(x = NULL, y = NULL, fill = "n",
       title = "DATRAS surveys",
       subtitle = "Number of hauls by ICES rectangles in 2017")

A little bit of caution here with the respect to the actual dimension of the ICES rectangles across large latitudinal range. TODO: Add the caution ...

Besides simple counting on the number of records one can also derive various other statistics based on rectangles. The following code shows how to create a map of temperatures:

d %>% 
  mutate(bottemp = ifelse(bottemp == -9, NA_real_, bottemp),
         surtemp = ifelse(surtemp == -9, NA_real_, surtemp)) %>% 
  group_by(lon, lat) %>% 
  summarise(m = mean(bottemp, na.rm = TRUE)) %>% 
  ggplot() +
  geom_raster(aes(lon, lat, fill = m)) +
  geom_polygon(data = map_data("world"), aes(long, lat, group = group), fill = "grey") +
  scale_fill_viridis_c(option = "B", direction = -1) +
  coord_quickmap(xlim = range(d$lon), ylim = range(d$lat)) +
  labs(x = NULL, y = NULL, fill = "n",
       title = "DATRAS surveys",
       subtitle = "Mean bottom temperature by ICES rectangles in 2017")

Note, no correction if made with respect to the survey season.

Irregular shapes

When we imported the data we also allocated each haul position to the ICES area system. Hence we can easily make the following plot:

p <-
  hh %>% 
  filter(year == 2017,
         haulval == "V",
         quarter == 1) %>% 
  ggplot() +
  geom_point(aes(haullong, haullat, colour = faoarea)) +
  geom_polygon(data = m, aes(long, lat, group = group), fill = "grey") +
  coord_quickmap(xlim = xlim, ylim = ylim) +
  labs(x = NULL, y = NULL, colour = "ICES area")
p

The ICES areas are part of the FAO global areas. Because of their irregularity they are generally structured as shapefiles that are of specific formats. Thankfully ggplot2 provides a way of handling such formats such that the user does not really know all the nitty gritty details that reside under the hood (TODO: make a link to where this is explained in more detail).

For the sake of full reproducibility the shapefiles have been made available at an ftp-site so that they be easily read into R. A convenient function that resides in the gisland package makes the process of downloading and importing a little less painful in terms of coding:

icesarea <- 
  gisland::read_sf_ftp("fao-areas") %>%
  mutate(reg = str_sub(name, 1, 2)) %>%
  # only ICES areas
  filter(reg == "27") %>% 
  # turn the sf-object into sp-object so it can be used by calling geom_path
  as("Spatial")

p +
  geom_path(data = icesarea, aes(long, lat, group = group))

Temporal overview

read_rds("data/hh_datras.rds") %>% 
  select(survey, quarter, year) %>% 
  distinct() %>% 
  mutate(active = TRUE) %>% 
  complete(survey, quarter, year) %>% 
  ggplot(aes(year, quarter)) +
  geom_tile(aes(fill = active), colour = "white", lwd = 0.5) +
  facet_grid(survey ~ .) +
  theme(legend.position = "none",panel.grid = element_blank()) +
  coord_cartesian(expand = FALSE)

hh %>% 
  filter(haulval == "V") %>% 
  group_by(year, quarter, gear) %>% 
  count() %>% 
  ggplot(aes(year, n)) +
  #theme_minimal() +
  geom_point() +
  #geom_line() +
  facet_grid(gear ~ quarter) +
  scale_colour_brewer(palette = "Set1")

hh %>% 
  filter(haulval == "V",
         gear == "GOV") %>% 
  group_by(year, quarter, country) %>% 
  count() %>% 
  ggplot(aes(year, n)) +
  #theme_minimal() +
  geom_point() +
  #geom_line() +
  facet_grid(country ~ quarter) +
  scale_colour_brewer(palette = "Set1")

hh %>% 
  filter(haulval == "V") %>% 
  mutate(quarter = paste("Quarter", quarter)) %>% 
  ggplot(aes(factor(year), hauldur)) +
  geom_boxplot(outlier.size = 0.25, fill = "pink", alpha = 0.2) +
  facet_grid(quarter ~ .) +
  scale_x_discrete(breaks = seq(1965, 2025, by = 5)) +
  scale_y_continuous(breaks = c(0, 30, 60, 90)) +
  expand_limits(y = 0) +
  labs(x = NULL, y = "Haul duration [minutes]",
       title = "NS-IBTS",
       subtitle = "Distribution of haulduration")

Gear metrics

NOTE: Stuff at the moment just data explorations. It is of interest to look at patterns in gear metrics because they affect the constant catchability assumption through time in stock assessment models via the link equation:

$$U_y = q B_y$$

Besides the stuff above, there are some interesting variables in the haul dataframe that relate to gear metrics. Lets read in all the haul data available:

hh <- read_rds("data/hh_datras.rds")

Does a longer warp-length result in wider doorspread

Lets just look at the "GOV"-gear:

d <-
  hh %>% 
  filter(gear == "GOV")
p <- 
  d %>% 
  ggplot(aes(warplngt, doorspread)) +
  geom_point(size = 0.5, alpha = 0.1) +
  scale_x_continuous(breaks = c(seq(0, 500, by = 100), seq(1000, 5000, by = 500))) 
p

The overall pattern indicates that as the warp-length is increased a certain plateau of the door-spread is reached. This is expected because the geometry of the net is going to limit the how much the doors can spread out. In general the maximum door-spread is around 100 meters but there is an separate cloud of points with a maximum around 125 meters. In addition at 100 meter warp-length there what looks like an anomalous distribution of door-spreads.

Lets split the data by surveys:

p +
  facet_wrap(~ survey)

Again in some of the surveys a clear dichotomy or bimodality is observed in the relationship of warp-length and door-spread. And the anomaly at 100 meter warp-length seems to be associated with the NS-IBTS. Without knowing much more about the reporting of the survey data, lets for the time being just delete the 100 meter warp-length data from the NS-IBTS:

d <- 
  d %>% 
  filter(!(warplngt == 100 & survey == "NS-IBTS"))

But what is the reason for the biomodality?

Lets just look at the most recent years of data:

d %>% 
  filter(year >= 2010) %>% 
  ggplot(aes(warplngt, doorspread, colour = country)) +
  geom_point(size = 0.5, alpha = 0.1) +
  scale_x_continuous(breaks = c(seq(0, 500, by = 250), seq(1000, 5000, by = 500))) +
  facet_wrap(~ country)

d %>% 
  filter(year >= 2010) %>% 
  mutate(r = warplngt / doorspread) %>% 
  filter(r < 20) %>% 
  ggplot(aes(warplngt, r, colour = country)) +
  geom_point(size = 0.5, alpha = 0.1) +
  scale_x_continuous(breaks = c(seq(0, 500, by = 250), seq(1000, 5000, by = 500))) +
  facet_wrap(~ country)

d <- 
  d %>% 
  mutate(r = warplngt/doorspread)
d %>% 
  drop_na(r) %>% 
  filter(r < 20) %>% 
  ggplot(aes(year, r, group = year)) +
  geom_violin(scale = "width") +
  facet_wrap(~ survey)
d %>% 
  filter(r < 20) %>% 
  ggplot(aes(depth, r, colour = factor(quarter))) +
  geom_point(size = 0.5) +
  facet_wrap(~ quarter)

hh %>% 
  filter(gear %in% c("BAK", "GOV", "PORB", "ROT", "TLV")) %>% 
  select(survey, gear, doorspread, wingspread) %>% 
  drop_na() %>% 
  # get rid of implausibles
  filter(wingspread <= 0.80 * doorspread) %>% 
  ggplot(aes(doorspread, wingspread)) +
  geom_point(size = .5, alpha = 0.1) +
  geom_smooth(se = FALSE) +
  scale_color_brewer(palette = "Set1") +
  facet_wrap(~ gear)

gov <- 
  hh %>% 
  filter(gear %in% c("GOV"),
         !is.na(wingspread),
         !is.na(doorspread)) %>% 
  # get rid of implausibles
  filter(wingspread <= 0.80 * doorspread) %>% 
  mutate(r = wingspread / doorspread)
gov %>% 
  filter(quarter %in% c(1,3),
         year %in% 2005:2017) %>% 
  ggplot(aes(factor(year), r)) +
  geom_violin(scale = "width") +
  geom_jitter(size = 1, alpha = 0.1) +
  #scale_colour_brewer(palette = "Set1") +
  labs(x = NULL,
       y = "Ratio of wingspread over doorspread") +
  facet_wrap(~ quarter)
gov %>% 
  ggplot(aes(depth, r)) +
  geom_point(size = 1, alpha = 0.1) +
  geom_smooth()

hh %>% 
  filter(gear %in% c("BAK", "GOV", "PORB", "ROT", "TLV")) %>% 
  # get rid of implausibles
  filter(warplngt > doorspread) %>% 
  ggplot(aes(wingspread, netopening, colour = gear)) +
  geom_point()

hh %>% 
  filter(gear %in% c("BAK", "GOV", "PORB", "ROT", "TLV")) %>% 
  select(wingspread, netopening, gear) %>% 
  drop_na() %>% 
  filter(netopening < wingspread) %>% 
  ggplot(aes(wingspread, netopening, colour = gear)) +
  geom_point(size = 1, alpha = 0.2) +
  geom_smooth(method = "lm")