In einarhjorleifsson/datrasdoodle: Does not matter.

ALK stuff

Introduction to the age data

Let's first look at the structure of the age dataframe:

glimpse(ca)

Lets generate a code for an age-length-key for a particular species and survey from the age data, ignoring sex and maturity:

Year <- 2014
Quarter <- 3
Latin <- "Gadus morhua"
hh2 <-
  hh %>% 
  filter(year == Year,
         quarter == Quarter)

ca2 <- 
  hh2 %>% 
  select(id) %>% 
  left_join(ca, by = "id") %>% 
  filter(latin == Latin) %>% 
  mutate(length = floor(length),
         age = ifelse(age > 9, 9, age)) %>% 
  group_by(age, length) %>% 
  summarise(n = sum(n)) %>% 
  ungroup() %>% 
  drop_na()

alk.empirical <-
  ca2 %>% 
  group_by(length) %>% 
  mutate(p = n / sum(n, na.rm = TRUE)) %>% 
  select(-n) %>% 
  drop_na()

alk.empirical %>% 
  ggplot(aes(length, p, colour = factor(age))) +
  geom_point() +
  geom_line(lwd = 0.3, linetype = 5) +
  scale_color_brewer(palette = "Set3") +
  labs(x = "Length",
       y = "Probability",
       colour = "Age",
       title = "Probability of age at length")

So for each length class we have observations (points) we have calculated the probability that a fish belongs to a particular age class. The sum of the probabilities within a length group is by definition always one. Lets take the 70 cm length class as an example:

alk.empirical %>% 
  filter(length == 70) %>% 
  knitr::kable()

Here we have 5 age-classes within the length class, the most numerous ones being age 3 (50%), then 4 (22%) and 5 (17%), while around 6% of the fish are of age 2 and 6.

The principle - empirical approach

Statistical modelling

In the above empirical approach there is a bit of a problem - missingness :-) The reader may have noticed the warning message given when plotting the "Length frequency by age":

## Warning: Removed 1 rows containing missing values (position_stack).

This message was given because varible n.age has one record with NA. It so happens that it is the first record so we can see it by:

glimpse(d)

This means that length class 5 cm was recorded in the length measurements (hl), but no fish of that length class was aged (ca) in the year (2014) and quarter (3) for the species (Gadus morhua) in question. Since in this particular case it was the smallest length class one could obviously assign it "manually" to the youngest age group (age 0) in the dataset. But "missingness" of age samples in other length classes are bound to occur, particularly if we start to use some kind of spatial stratification of the age-length-key. So we need some generic approach. In this section we will take take some statistical approach.

The mlogit model

....

Reference to @henningsen2011maxlik

Here, at least for the time being, we hide things in a little function:

alk.mlogit <-
  ca2 %>% 
  fit_alk(lengths = c(min(lfs$length):max(lfs$length)), model = "mlogit")

In the above step we start with the same dataframe (ca2) as used when generating the alk.empirical above and we end with an alk-dataframe that has the same structure as the alk.empirical generated above. The thing in the middle is a bit of a black box. Lets take a peek:

alk.mlogit %>% filter(length == min(length)) %>% glimpse()

So even in the smallest length class (5 cm) we have some small probabilities that the fish will belong to some older age classes. Although biologically not really possible, that is statistics :-) The beauty is though that know we have solved the problem of "missingness" in our data - all length classes will be assigned to having some probabilities of belonging to all of the age classes in the data. Before we proceed, lets take a visual peek of what we have just done:

ggplot() +
  geom_point(data = alk.empirical,
            aes(length, p, colour = factor(age))) +
  geom_line(data = alk.empirical,
            aes(length, p, colour = factor(age)),
            lwd = 0.3, linetype = 5) +
  geom_line(data = alk.mlogit,
            aes(length, p, colour = factor(age))) +
  scale_colour_brewer(palette = "Set3") +
  labs(x = "Length [cm]",
       y = "Probability",
       colour = "Age",
       title = "Probability of age by length",
       subtitle = "Empirical (points and dashed line) and mlogit fit (line)")

We can now use the modeled alk's to calculate the abundance in each haul by age:

by.haul.age <-
  by.haul %>% 
  left_join(alk.mlogit, by = "length") %>% 
  # NOTE: check why this is "needed"
  drop_na() %>% 
  group_by(id, age) %>% 
  summarise(n = sum(p * n)) %>% 
  ungroup() %>% 
  left_join(hh2 %>% 
              select(id, shootlong, shootlat),
            by = "id")

But what if we wanted to generate an age-length-key for multiple years?

hh3 <-
  hh %>% 
  filter(year %in% 2014:2017,
         quarter == 3)
ca3 <-
  hh3 %>% 
  select(id, year) %>% 
  left_join(ca) %>% 
  filter(latin == Latin) %>% 
  mutate(length = floor(length),
         age = ifelse(age > 9, 9, age)) %>% 
  group_by(year, age, length) %>% 
  summarise(n = sum(n)) %>% 
  ungroup() %>% 
  drop_na() %>% 
  group_by(year) %>% 
  nest() %>% 
  # Have no idea why is should use the ".$data" rather than just "data"
  mutate(alk = purrr::map(.$data, fit_alk))
ca3

# testing
identical(ca3$alk[[1]], ca2 %>% fit_alk())

Or for that matter multiple years and multiple strata within a year??

hh4 <-
  hh %>% 
  filter(year %in% 2014:2017,
         quarter == 3,
         nsarea %in% 1:10) %>% 
  mutate(nsarea = ifelse(nsarea <= 5, 1, 2))

ca4 <-
  hh4 %>% 
  select(id, year, nsarea) %>% 
  left_join(ca) %>% 
  filter(latin == Latin) %>% 
  mutate(length = floor(length),
         age = ifelse(age > 9, 9, age)) %>% 
  group_by(year, nsarea, age, length) %>% 
  summarise(n = sum(n)) %>% 
  ungroup() %>% 
  drop_na() %>% 
  group_by(year, nsarea) %>% 
  nest() 

ca4 %>% 
  mutate(alk = purrr::map(.$data, fit_alk))
ca4 %>% 
  filter(year %in% c(2015, 2017)) %>% 
  mutate(alk = purrr::map(.$data, fit_alk))

... is the error related to the data & model specifications or to the nesting approach?? Check with:

ca4$data[[1]] %>% fit_alk()

i.e. it is within the data & model specifications.

Some other statistical model

... check @berg2012spatial

library(mgcv)

hh <- read_rds("data/ns-ibts_hh.rds")
hl <- read_rds("data/ns-ibts_hl.rds")
ca <- read_rds("data/ns-ibts_ca.rds")
Year <- 2014
Quarter <- 3
Latin <- "Gadus morhua"
hh2 <-
  hh %>%
  filter(year == Year,
         quarter == Quarter)

ca2 <-
  hh2 %>%
  select(id) %>%
  left_join(ca, by = "id") %>%
  filter(latin == Latin) %>%
  mutate(length = floor(length),
         age = ifelse(age > 9, 9, age)) %>%
  group_by(id, age, length) %>%
  summarise(n = sum(n)) %>%
  ungroup() %>%
  drop_na()

alk <-
  fitALKXX(ca2, minAge = 1, maxAge = 4) %>%
  fit_alk10()
glimpse(alk)

https://colinfay.me/tidyeval-1

The engineer's approach

...

Age based indices

Issues

NOTE: Need to check interpretation of the n-variable (original name CANoAtLngt) in ca-data. The DATRAS pages refer to this variable as meaning: "Amount of fish at the given category (per haul, species, length class, sex, maturity, age)."

NOTE: In some cases the haul numbers are "NA". Example:

ca %>% 
  filter(id == "2004_1_THA2_GOV_NA") %>% 
  glimpse()
# but
hh %>% 
  filter(id == "2004_1_THA2_GOV_NA") %>% 
  glimpse()

So we have some sort of an orphan in the ca data

einarhjorleifsson/datrasdoodle documentation built on May 27, 2019, 2:09 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

einarhjorleifsson/datrasdoodle
Does not matter.

In einarhjorleifsson/datrasdoodle: Does not matter.

ALK stuff

Introduction to the age data

The principle - empirical approach

Statistical modelling

The mlogit model

Some other statistical model

The engineer's approach

Age based indices

Issues

R Package Documentation

Browse R Packages

We want your feedback!

einarhjorleifsson/datrasdoodle Does not matter.

In einarhjorleifsson/datrasdoodle: Does not matter.

ALK stuff

Introduction to the age data

The principle - empirical approach

Statistical modelling

The mlogit model

Some other statistical model

The engineer's approach

Age based indices

Issues

R Package Documentation

Browse R Packages

We want your feedback!

einarhjorleifsson/datrasdoodle
Does not matter.