Prepare the Data to Examine Unimpacted Females

We decided that in order to compare the health of impacted reproductively active females we'd want to have a reference category of unimpacted but reproductively active females. So that means females that have had a calf, but who have never been entangled. Turns out this isn't a large #(!) but it gives us a useful reference class. To get that data assembled, we run this function and return its output as 4 different list elements:

  1. healthnew - this is a data frame containing finite values of health for animals that comprise the unimpacted criteria
  2. nmon - the number of months with finite health values in healthnew
  3. nmomThold - the number of months in healthnew with health values below the specified health threshold
  4. pThold - the percentage of months below the specified health threshold, i.e. (nmomThold / nmon) * 100

These elements will be used when we assemble the data for plotting.

knitr::opts_chunk$set(warning=FALSE, message=FALSE)
library(tangled)
library(magrittr)
library(dplyr)
library(ggplot2)
thold <- 67
tmp <- prephThreshDataUnImp(healthmean, firstSight, lastSight, thold)

Prepare the Data for the Impacted Females

With the unimpacted data prepared, we need to assemble the data that determines the amount of time (in months) individual entangled whales are spending below specific health thresholds. We do this with the prephThreshDataRepro.R function. This function returns a list with two elements: 1) rfmthold a 6 by 6 data frame summarising the months below the health threshold, and 2) dfout the same information, but not summarised, i.e. retained at the level of the individual event. We'll use rfmthold and dfout for statistical analysis.

tmp2 <- prephThreshDataRepro(healthmean, thold)
tmp2$rfmthold

2022-02-02

New plot of the values with uncertainty. First we set up the data.

# unimpacted
# df_plot_unimp <- quantile(tmp$pthold_post, probs =  c(.25, .5, .75))
df_plot_unimp <- data.frame(gearInj = 0, x = c(4.641350,  8.989394, 23.536114), q = c(.25, .5, .75))
# impacted
df_plot <- tmp2$dfout %>% group_by(gearInj) %>% 
  summarize(x = quantile(pctmonthold, c(.25, .5, .75)), q = c(.25, .5, .75))

gname <- c('Severe - gear', 'Moderate - gear', 'Severe - no gear', 
             'Minor - gear', 'Moderate - no gear', 'Minor - no gear')
df_plot_all <- bind_rows(df_plot, df_plot_unimp)
df <- data.frame(gearInj = unique(df_plot_all$gearInj), 
                 gearName = c(gname, 'Unimpacted'),
                 lower = df_plot_all$x[df_plot_all$q == 0.25],
                 median = df_plot_all$x[df_plot_all$q == 0.5],
                 upper = df_plot_all$x[df_plot_all$q == 0.75])
dfb <- prepHealthThresholdPlotData(tmp, tmp2$rfmthold)
dfb$lower <- df$lower
dfb$median <- df$median
dfb$upper <- df$upper

And then make the plot

bsize = 12
p <- ggplot(df, aes(x = gearName, y = median))+
    geom_pointrange(aes(ymin = lower, ymax = upper))+
    theme_bw(base_size = bsize)+
    labs(y = paste('% of Months w/ Health < ', thold, sep = ''), x = 'Entanglement Category')+
    scale_x_discrete(limits = c('Unimpacted', 'Minor - no gear', 'Minor - gear', 'Moderate - no gear',
                                'Moderate - gear', 'Severe - no gear', 'Severe - gear'),
                     labels = c('Unimpacted', 'Minor\nNo Gear', 'Minor\nGear',
                                'Moderate\nNo Gear', 'Moderate\nGear',
                                'Severe\nNo Gear', 'Severe\nGear'))+
     scale_y_continuous(breaks = c(0, 25, 50, 75), labels = c('0%', '25%', '50%', '75%'))+
  theme(legend.position="none")
    # labs(fill = 'Reproductive\nFemales')+
    # scale_fill_brewer(type = 'qual', palette = 'Paired', direction = -1)+
    # theme(legend.position = c(0.125, 0.875))
  p  

UPDATE - 6/14/2016 You had asked about the numbers of animals below the threshold as a function of category because you were surprised about the number of minors with all of their months below 67. I've summarised the data a bit to look at that a bit further.

So what I do in this next chunk of code is to take the percentages of months below the threshold, and assign them into 6 different bins with the cut() function. Note that this is just to show the summary in a bit coarser way. Then I normalise within entanglement types to show the percentage of events by category that are below the threshold. Then I plot them.

dfmat <- tmp2$dfout
dfmat$intLev <- cut(dfmat$pctmonthold, 6)
round(table(dfmat$gearInj, dfmat$intLev) / rowSums(table(dfmat$gearInj, dfmat$intLev)), 3)
tabmat <- as.data.frame(table(dfmat$gearInj, dfmat$intLev) / rowSums(table(dfmat$gearInj, dfmat$intLev)))

ggplot(tabmat, aes(Var2, Var1) )+
  geom_tile(aes(fill = Freq))

What that shows is that while indeed there are more events below the threshold as the severity increases, you still do have some minor events where the animals are in bad health ~19.7%.

Assemble and Plot the Data

Ok, back to the previous analysis. Now we can put these two data sources together and make the plot.

dfb <- prepHealthThresholdPlotData(tmp, tmp2$rfmthold)
dfb

Let's make this a table for the manuscript as well:

knitr::kable(dfb)
p <- plotHealthThreshold(dfb, bsize = 12)
p
ggsave(plot = p, filename = 'healthThold.pdf', path = '~/Documents/research/manuscripts/KnowltonEtAl_Entanglement/images/', device = 'pdf', width = 9, height = 6, units = 'in')
pweb <- plotHealthThreshold(dfb, bsize = 16)
ggsave(plot = p, filename = 'healthThold.png', path = '~/Documents/research/manuscripts/KnowltonEtAl_Entanglement/images/', device = 'png', dpi = 300, width = 9, height = 6, units = 'in')

Statistical Analysis of the differences

Ok, a quick Kruskal Wallis test indicates there's no difference in the percentages:

kruskal.test(dfb$pctthold, factor(dfb$gearInj), data = dfb)

However, I don't thnk this is really the right way to test this. A more appropriate way is to have a row for each entanglement event. There's a clear mass at 0 here, which is problematic.

hist(tmp2$dfout$pctmonthold)

Update - 6/14/2016 - I'll use the arcsine transformation instead

So I can transform the entanglement data, and then perform a glm on that. I'll make Minor - No Gear the reference class first

asinTransform <- function(p) { asin(sqrt(p)) }

tmp2$dfout$pctTrans <- asinTransform(tmp2$dfout$monthold / tmp2$dfout$nmonths)
tmp2$dfout$gearInj <- relevel(factor(tmp2$dfout$gearInj), ref = 6)
ft1 <- lm(pctTrans ~ factor(gearInj), data = tmp2$dfout)
summary(ft1)

Which suggests that severe with gear (class 1) is significantly worse. Of course, this doesn't add include the unimpacted animals, but I'm really not sure how to do that. We don't want to add just the summary 6% value, because that would be mixing data types - a summary over all unimpacted animals together with a data frame of individual entanglement events.

To do that, I modified the function prepThreshDataUnImp to return a vector of percentage of months below the threshold. This is tmp$pTholdInd which I can fold into tmp2$dfout. I'll do that here:

unDat <- data.frame(egno = noquote(names(tmp$nmonInd)),
                    nmonths = tmp$nmonInd,
                    monthold = tmp$nmonTholdInd,
                    pctmonthold = tmp$pTholdInd,
                    gearInj = 0)
unDat <- unDat[!is.na(unDat$pctmonthold), ]
unDat$gearInj <- factor(unDat$gearInj)
levels(unDat$gearInj) <- c(levels(unDat$gearInj), levels(tmp2$dfout$gearInj))
unDat$pctTrans <- asinTransform(unDat$monthold / unDat$nmonths)
dnew <- rbind(tmp2$dfout, unDat)

Now we can try a glm with all the data

dnew$gearInj <- relevel(factor(dnew$gearInj), ref = '0')
summary(fit3 <- lm(pctTrans ~ factor(gearInj), data = dnew))

This corroborates the above test - severe with gear is significantly different from the Unimpacted. None of the others seem significant.



robschick/tangled documentation built on May 9, 2022, 4:07 p.m.