knitr::opts_chunk$set(echo = TRUE)
library(lubridate)

The purpose of this document is to provide an initial overview of the suggested changes for analyzing the health and survival output from the right whale health model. These changes were suggested by Jim in response to our two initial analyses. In the first one, we enumerated health anomaly (deviation of individual health from population average health) during the entanglement window for each of the different injury clasess, in order of increasing severity:

  1. Severe with gear
  2. Moderate with gear
  3. Severe without gear
  4. Minor with gear
  5. Moderate without gear
  6. Minor without gear

(n.b. that we did this for both reproductively active females and non-reproductively active females.)

To test if health in each of these different classes was different, we conducted a glmm(), which indicated significant differences. Jim's issue with this approach was that we are treating the posterior as data, and thereby conducting statistics on statistics. He suggests using and summarizing the posterior directly, and not as data. In each sub-section below I will outline the approach with some code and initial results. In each case, we will rely on different posterior distributions -- for health and for survival.

Health During Entanglement

We have posterior estimates of health for each animal - stored as moments (mean, sd) in two different matrices - each of which with dimension: rows == number of animals, and columns == number of months in the analysis. Whereas before we intersected this information with the entanglement window to extract and summarize health during the window, the proposal here is to draw monthly health information for the entangled animal for the duration of the window. This would give us one sample of the time series of health. We would draw from the posterior of health many times, say 1,000, in order to build up the estimate of health and uncertainty around that health during the entanglement period.

However, we need to answer the inferential question of whether/how the distribution of these health values differ from others. To do that, we would sample 1,000 health draws from a reference animal (note how we define that is an important question to be resolved). One example of a reference animal could be an animal that was:

  1. similar in age to the entangled animal
  2. same gender as the entangled animal
  3. alive
  4. unentangled

Using these two distributions of health, we would then construct and compare the distribution of differences between the entangled and unentangled whale. This would mean that for each animal in each class, we would have a distribution of differences in health of size 1,000. We would repeat this process for all animals, and then be able to quantify the difference between the entangled animals and the unimpacted animals.

We start by choosing one entangled animal:

load("E:/rob/Documents/research/projects/rightWhaleEntanglement/src/data/tangRepro.rda")
load("E:/rob/Documents/research/projects/rightWhaleEntanglement/src/data/tangleOut.rda")
load("E:/rob/Documents/research/projects/rightWhaleEntanglement/src/data/healthmean.rda")
load("E:/rob/Documents/research/projects/rightWhaleEntanglement/src/data/healthsd.rda")
load("E:/rob/Documents/research/projects/rightWhaleEntanglement/src/data/ID.rda")
load("E:/rob/Documents/research/projects/rightWhaleEntanglement/src/data/firstSight.rda")
load("E:/rob/Documents/research/projects/rightWhaleEntanglement/src/data/lastSight.rda")
load("E:/rob/Documents/research/projects/rightWhaleEntanglement/src/data/myName.rda")
load("E:/rob/Documents/research/projects/rightWhaleEntanglement/src/data/gender.rda")
tangRepro[tangRepro$EGNo == 1014, ][2, ]

We can look at that health trace:

anID <- 1014
idx <- which(ID == anID)
genAn <- gender[idx]
plot(healthmean[idx, firstSight[idx]:lastSight[idx]], type = 'l')

Impacted Animal

The algorithm would look something like this. We start by extracting one entanglement event from this animal, and gathering the start and stop times, and then we use the two health moments to sample from the posterior. In particular, we'll make use of the swindmonyr and ewindmonyr time indices to pull out the moments. I'll do this from the second entanglement highlighted above.

startWinDate <- year(tangRepro[tangRepro$EGNo == anID, ][2, 'StartDateWindow'])
startW <- match(tangRepro[tangRepro$EGNo == anID, ][2, 'swindmonyr'], myName)
endW <- match(tangRepro[tangRepro$EGNo == anID, ][2, 'ewindmonyr'], myName)
hmomMean <- healthmean[idx, startW:endW]
hmomSD <- healthsd[idx, startW:endW]

What do the moments look like?

hmomMean
hmomSD

Ok, we'll want to sample from this to build up the health posterior of the impacted animal.

ndraw <- 1000
himpMat <- matrix(NA, ndraw, ncol = length(hmomMean))

for (i in seq_along(1:ndraw)){
  himpMat[i, ] <- rnorm(4, hmomMean, hmomSD)
}

head(himpMat)

Reference Animal

Ok, with that built for the entangled animal, we want to sample a reference animal. Here I'll choose based on the criteria noted above. 1014 is a Female, and her age class is Adult. We need to find an unentangled adult female during the same window (r myName[startW], r myName[endW]). This part of the algorithm is going to take some work and thought. The first set of possible animals to remove would be those that are entangled in the same time frame. We'll use the tangleOut data frame to remove these. To do this I'll remove any animals that were entangled in the same year.

load(file = 'data-raw/eg_203_ng_50000_BIG_25000_BIG_25000.rdata')
candidates <- tangleOut[year(tangleOut$StartDateWindow) != startWinDate, 'EGNo']
candidates <- candidates[gender[match(candidates, ID)] == 'F'] # candidate entangled females 
ssub <- sights[sights$AgeClassCode == 'A' & sights$GenderCode == 'F' & sights$SightingYear == startWinDate, ]
ocand <- unique(ssub$SightingEGNo)
fincand <- ocand[ocand %in% candidates]
fincand

With that list of possible animals, then we can choose one at random, and assemble its health.

cID <- sample(fincand, 1)
cidx <- which(ID == cID)

hmomMean <- healthmean[cidx, startW:endW]
hmomSD <- healthsd[cidx, startW:endW]
ndraw <- 1:1000
hrefMat <- matrix(NA, max(ndraw), ncol = length(hmomMean))

for (i in seq_along(ndraw)){
  hrefMat[i, ] <- rnorm(length(hmomMean), hmomMean, hmomSD)
}

Distribution of Differences

Once we've sampled from the posterior for each animal, then we have to examine the differences.

hdiff <- himpMat - hrefMat
hdiffsum <- rowMeans(hdiff)
hist(hdiffsum, breaks = 30, main = 'Difference Between Impacted and Healthy Animal', xlab = 'Health Units')

In the above plot, 100% of the sampled cases were below zero--indicating a significant effect of this entanglement on health.

That's the basic idea for one event, but we'd need to repeat this for every event and reproductive category and then make the summary plots for each one, i.e. a plot of health differences between severly entangled whales carrying gear and reference whales, between severly entangled whales without gear and reference whales, etc.

Which is the Best Reference Animal

One of the issues the above analysis raises, is how do we know that this is a good animal to compare against? What if we chose a different animal, that for whatever reason led to a skewed result. I'll try to investigate that by comparing the health of this impacted animals against a range of candidates to see what the results look like. Se we can repeat the above calculation, but now do it for all candidate animals, not just this one reference.

hdiffList <- vector('list', length(fincand))
ndraw <- 1:1000

for (j in seq_along(fincand)) {
  cID <- fincand[j]
  cidx <- which(ID == cID)

  hmomMean <- healthmean[cidx, startW:endW]
  if(any(is.na(hmomMean))) next
  hmomSD <- healthsd[cidx, startW:endW]
  hrefMat <- matrix(NA, max(ndraw), ncol = length(hmomMean))

  for (i in seq_along(ndraw)){
    hrefMat[i, ] <- rnorm(length(hmomMean), hmomMean, hmomSD)
  }

  hdiffList[[j]] <- data.frame(refAn = cID, meanDiff = rowMeans(himpMat - hrefMat))
}

We need to create a data frame to make it a bit easier to plot.

hdiffList2 = hdiffList[-which(sapply(hdiffList, is.null))]
hdiffdf <- do.call(rbind.data.frame, hdiffList2)

With that built up, let's plot it.

library(ggplot2)
pall <- ggplot(hdiffdf, aes(meanDiff))+
  geom_density()+
  labs(title = 'All Animals Averaged Together')
pall

p <- ggplot(hdiffdf, aes(meanDiff, group = refAn))+
  geom_density()+
  labs(title = 'All Animals Plotted Separately')
p

Impacts of Entanglement on Survival

Ok, with survival the idea is similar -- we take an impacted animal and look at its survival curve in relationship to a randomly sampled reference individual. There are several different ways to do this. We'll start here by looking at probability of survival for the impacted animal during the entanglement event and compare that to a randomly selected animal. (Here, I'll use a different impacted animal from above because 1014 is an observed dead animal - hence no uncertainty in death.) We then build up the distribution of differences in survival and plot them. Again, as before, the selection of the reference case bears careful consideration. One first set of candidates could be the animals that were never entangled, but there could be year/animal mismatches here once we start sampling more. By this I mean with so few animals that bear no scars, we could end up with year and month combinations where we have an impacted animal, but no reference animal.

Let's look at the posterior, which provides an estimated month the animal died in, and then we can use that to summarize survival.

anID <- 1130
idx <- which(ID == anID)
deathyr[idx, 360:375] # this snippet shows the candidate months of death
round(deathyr[idx, 360:375] / ng, 4)
svec <- 1 - cumsum(deathyr[idx, ] / ng) 
svec[1:firstSight[idx]] <- 0
plot(svec)

We need to sample from this posterior 1,000 times to build up the distribution of survival probabilities.

survMat <- matrix(NA, nrow = 1000, ncol = nt)

for (i in 1:nrow(survMat)){
  psamp <- deathyr[idx, ] / sum(deathyr[idx, ], na.rm = TRUE)
  deaths <- which(rmultinom(1, 1, psamp) == 1)  
  survMat[i, firstSight[idx]:(deaths - 1)] <- 1
}


robschick/tangled documentation built on May 9, 2022, 4:07 p.m.