In jmlondon/hsmm_seasons: Estimating Seasonal Behavior States from Biologging Sensor Data

# set global chunk options
library(knitr)
options(width = 80)
opts_chunk$set(dev = "png",
               dev.args = list(type = "cairo",antialias = "none"),
               dpi = 192,warning = FALSE, message = FALSE)

library(captioner)
fig_nums <- captioner()
table_nums <- captioner(prefix = "Table")

library(kotzeb0912)
library(sp)
library(rgdal)
library(trip)
library(crawl)
library(argosfilter)
library(parallel)
library(dplyr)
library(lubridate)
library(ggplot2)
library(ggthemes)
library(scales)
library(pander)

Introduction

The seasonal timing of key, annual life history events is an important component of many species' ecology and evolutionary history. Highly migratory species time their arrival at locations to coincide with favorable environmental conditions or peaks in prey availability. Non-migratory species in highly seasonal zones have adapted to time important events such as reproduction and rearing of young with more favorable seasons. Both of these strategies are present in the Arctic ecosystem. As the climate warms and seasonal timing changes, how well Arctic animals adjust their phenology will be an important response. Unfortunately, many Arctic species are widley dispersed and exist in inaccessible or remote habitats. Observing Arctic species with the regularity and precisions required to establish phenological patterns is a significant challenge.

Arctic seal species (e.g., bearded seal, ringed seal, ribbon seal) are classified as ice-associated or ice-obligate in recognition of their reliance on sea-ice for key annual events such as pupping, weaning, and molting. Sea-ice habitat availability for these species is highly seasonal and each species has adapted their annual cycles and behaviors to best optimize their use of this habitat. Ringed seals rely on fast ice and surface snow accumulation to construct lairs where pups are born and sheltered from the weather and predators. In years when snow melt begins early, lairs are subject to collapse and pups are more susceptible to predation and exposed to the elements. Ribbon seals are mostly pelagic throughout the year but are closely associated with the marginal sea-ice zone during pupping, breeding, weaning, and molting. All of these critical events are compressed into just a few months that coincide with maximum sea-ice extent in the Bering Sea and the initial melt period. Bearded seals are more closely associated with sea-ice throughout the year, but their use of sea-ice as a haul-out platform changes. Many bearded seals in the North Pacific will move through the Bering Strait into the Bering Sea as sea ice advances in the late fall and then back up into the Chukchi and Beaufort seas as it retreats in early summer.

Seasonal periods important to these seals (and other marine species) often do not align well with typical labels (i.e., spring, summer, winter, fall). The timing of key life history events is well documented only for species found in accessible habitats or breeding areas. Long term studies have documented such critical parameters as the timing of migration, arrival timing of breeders, peak pupping or hatching, and molting. Our knowledge of seasonal timing for species widely dispersed in inaccessible or remote habitats, however, is poor.

The deployment of bio-logging telemetry devices has been key tool for understanding the ecology and behavior of animals. And, has been an especially critical tool for studies of wide ranging species in remote locations. For marine mammals, and especially pinnipeds, bio-loggers are typically recording behavioral parameters (e.g., dive behavior, haul-out behavior, foraging events) that are expected to vary across seasons and life history events. The critical life history events of these seals are all associated with increased haul-out behavior and changes in dive behavior. Additionally, migratory-like movements are also key indications of seasonal states. By examining these behavioral parameters and information related to animal movement in a single, multivariate framework, the number of seasons and the timing of change can be estimated.

We propose using a multivariate hidden semi-Markov model (mhsmm) as an approach to identifying key seasonal periods and the timing of those seasons from bio-logging data. The hidden Markov model (hmm) is used frequently in the study of animal movement. In most cases, the hmm is used to identify states such as foraging, resting, or transit from characteristic movement (step-length and turning angle). In a few cases, additional behavior data has been incorporated to futher improve the model inference. While a few have proposed and implemented the use hidden semi-Markov models as better approaches for understanding wildlife behavior, they have not been previously implemented as a means for estimating seasonal level states. Hidden Markov models require a geometrically distributed sojourn time in a given state. Hidden semi-Markov models allow an arbitrary sojourn distribution --- the duration an animal spends in a state can depend on the time it has already spent in that state. O’Connell et al (2010) applied Hidden semi-Markov models to the estrus detection in dairy cows and developed the mhsmm library for R to support similar analyses.

insert background and more discussion on hmm, hsmm, and mhsmm

data("kotzeb0912_locs")
deployments <- kotzeb0912_locs %>% 
  dplyr::filter(instr == "Mk10") %>% 
  dplyr::group_by(deployid) %>% 
  dplyr::summarise(start_date = min(date_time),
                   end_date = max(date_time),
                   duration = max(date_time) - min(date_time))

Methods and Analysis

Between 2009 and 2012 seven adult bearded seals were captured and released with bio-logging devices that provide measures of movement, dive behavior, and hourly wet-dry proportions. The tags were deployed in June and July with deployment duration ranging from r format(min(deployments$duration),digits=2) to r format(max(deployments$duration), digits=2) days (median = r format(median(deployments$duration),digits=2) days). Data from these deployments are available as an R data package and we will use these data to demonstrate the application of multivariate hidden semi-Markov models to identify seasonal states from real world telemetry data.

The kotzeb0912 R package contains telemetry data from 14 deployments on seven adult bearded seals. Each seal was deployed with two tags --- one adhered to the hair on the head of the animal and another attached to the flipper. The flipper attached tags were removed from the dataset for this analysis. All of the telemetry data was transmitted via the Argos satellite system and processed through the Wildlife Computers Data Portal. The location data consist of coordinates estimated via Argos along with the associated error ellipses as well as locations derived from the FastLoc GPS sensor on three of the deployments. Error for the FastLoc GPS coordinates was presumed to be a fixed value of 50 meters. Behavior data of interest for this analysis includes histograms of maximum dive depths within a 6 hour period and the percentage of each hour the tag was dry.

Modeling Movement and Aligning Data

A key parameter in the evaluation of seasonal states will be movement --- specifically, the displacement in x and y coordinates at each time step. Location data from bio-logging devices are provided at irregular time steps and with varying error associated with each location estimate. The crawl R package was employed to model seal movement and estimate locations every 6 hours throughout the deployment. Prior to modeling the movement with crawl, the argosfilter package provided a course speed filter (vmax=3.5 m/s) to eliminate obvious outlier locations. Predicted locations are specified from the posterior mean and variance of the model track and coincide with the mid-point of the 6-hour dive behavior histograms. The lag difference in x and y coordinates provides a measure of northing and easting displacement at each time step. These values will form the basis for our xy-displacement multi-variate normal parameter in the model.

The haul-out behavior timelines are provided as hourly percent-dry values and are grouped into 6-hour blocks that are centered on our 6-hourly predictions. Dive behavior data are transmitted as dive histograms that represent the distribution of dives across predetermined depth bins for a given 6-hour period. To simplify things, the number of dives across all bins less than 10 meters are summed. The expectation is this will accurately represent the number of foraging dives over a 6 hour period. Both of these behavioral data streams are subjet to periods of missing data. Since these deployments occurred over multiple years, the number of days since 1 April was used as a standard time scale.

data("kotzeb0912_depths")
data("kotzeb0912_timelines")

kotzeb0912_locs <- kotzeb0912_locs %>% 
  dplyr::select(deployid,date_time = unique_posix,instr,
                quality,latitude,longitude,error_radius,
                error_semimajor_axis,
                error_semiminor_axis,
                error_ellipse_orientation) %>% 
  dplyr::filter(instr == "Mk10")  

kotzeb0912_depths <- kotzeb0912_depths %>% 
  dplyr::filter(deployid %in% kotzeb0912_locs$deployid) %>% 
  dplyr::rename(date_time = datadatetime)

kotzeb0912_timelines <- kotzeb0912_timelines %>% 
  dplyr::filter(deployid %in% kotzeb0912_locs$deployid) %>% 
  dplyr::rename(date_time = datadatetime)

locs <- kotzeb0912_locs
depths <- kotzeb0912_depths
drytimes <- kotzeb0912_timelines

locs <- locs %>% 
  dplyr::mutate(species = ifelse(grepl("^EB.",deployid),'Bearded seal',NA)) %>% 
  dplyr::filter(!is.na(species)) %>% arrange(deployid,date_time)

depths <- depths %>% 
  dplyr::mutate(species = ifelse(grepl("^EB.",deployid),'Bearded seal',NA)) %>% 
  dplyr::filter(!is.na(species)) %>% arrange(deployid,date_time)

drytimes <- drytimes %>% 
  dplyr::mutate(species = ifelse(grepl("^EB.",deployid),'Bearded seal',NA)) %>% 
  dplyr::filter(!is.na(species)) %>% arrange(deployid,date_time)

# speedfilter using the paralell package for multi-core speed
cfilter <- mclapply(split(locs,locs$deployid),function(x) sdafilter(
  lat = x$latitude, lon = x$longitude, dtime = x$date_time,
  lc = x$quality, ang = -1,vmax = 3.5),mc.preschedule = F,mc.cores = 3)
cfilter <- do.call("c",cfilter)
cfilter <- as.vector(cfilter)
locs$filtered <- cfilter

locs_data <- dplyr::filter(locs,filtered == "not") %>% 
  dplyr::arrange(deployid,date_time) %>% 
  as.data.frame(.)

data("kotzeb0912_gps")

kotzeb0912_gps <- kotzeb0912_gps %>% 
  dplyr::mutate(species = "Bearded seal") %>% 
  dplyr::select(deployid,species,date_time,latitude,longitude) %>% 
  dplyr::filter(!is.na(latitude)) %>% 
  dplyr::mutate(error_semimajor_axis = 50,
                error_semiminor_axis = 50, 
                error_ellipse_orientation = 1) 

locs_data <- locs_data %>% dplyr::bind_rows(kotzeb0912_gps) %>% 
  dplyr::arrange(deployid,date_time)

locs_data_ellipse <- dplyr::filter(locs_data,!is.na(error_semimajor_axis)) %>% 
  dplyr::arrange(deployid,date_time)

data <- as.data.frame(locs_data_ellipse)
coordinates(data) = ~longitude+latitude
proj4string(data) = CRS("+proj=longlat +datum=WGS84")

data = spTransform(data, CRS("+init=epsg:3571"))

## Loop over PTT and fit CTCRW models
ids = unique(data@data$deployid)

library(doParallel)
n.cores <- detectCores()
registerDoParallel(cores = 3)

model_fits_ellipse <-
  foreach(i = 1:length(ids)) %dopar% {
  id_data = subset(data,deployid == ids[i])
  diag_data = model.matrix(
      ~ error_semimajor_axis + error_semiminor_axis + error_ellipse_orientation,
      id_data@data
    )[,-1]

    id_data@data = cbind(id_data@data, 
                         argosDiag2Cov(
                           diag_data[,1], 
                           diag_data[,2], 
                           diag_data[,3]))

    init = list(a = c(coordinates(id_data)[1,1],0,
                      coordinates(id_data)[1,2],0),
                P = diag(c(5000 ^ 2,10 * 3600 ^ 2, 
                           5000 ^ 2, 10 * 3600 ^ 2)))

  fit <- crwMLE(
      mov.model =  ~ 1,
      err.model = list(
        x =  ~ ln.sd.x - 1, 
        y =  ~ ln.sd.y - 1, 
        rho =  ~ error.corr
      ),
      data = id_data,
      Time.name = "date_time",
      initial.state = init,
      fixPar = c(1,1,NA,NA),
      theta = c(log(10), 3),
      initialSANN = list(maxit = 2500),
      control = list(REPORT = 10, trace = 1)
    )
    fit
  }

names(model_fits_ellipse) <- ids

predData_ellipse <- foreach(i = 1:length(model_fits_ellipse)) %dopar% {
  if (!inherits(model_fits_ellipse[[i]],"try-error")) {
  model_fits_ellipse[[i]]$data$date_time <- lubridate::with_tz(
    model_fits_ellipse[[i]]$data$date_time,"UTC")
  predTimes <- seq(
    lubridate::ceiling_date(min(model_fits_ellipse[[i]]$data$date_time),"day"),
    lubridate::floor_date(max(model_fits_ellipse[[i]]$data$date_time),"day"),
    "6 hours")
  tmp = crwPredict(model_fits_ellipse[[i]], predTime = predTimes)
  } else return(NA)
}

predData_ellipse <- Filter(function(x) inherits(x,"crwPredict"),predData_ellipse)

predData_ellipse <- dplyr::bind_rows(predData_ellipse) %>% 
  dplyr::filter(locType == "p")

predData <- predData_ellipse

predData$predTimes <- intToPOSIX(predData$TimeNum)

drytimes <- drytimes %>% 
  dplyr::filter(deployid %in% predData$deployid)

ids <- unique(drytimes$deployid)
timelineData <- foreach(i = 1:length(ids)) %dopar% {
  t_dat <- drytimes %>% 
    dplyr::filter(deployid == ids[i]) %>% 
    dplyr::arrange(deployid,date_time)
  t_seq <- data.frame(date_time = seq(
    lubridate::floor_date(min(t_dat$date_time),"day"),
    lubridate::floor_date(max(t_dat$date_time),"day"),
    "1 hour"),deployid = ids[i])
  t_dat <- merge(t_dat,t_seq,all = TRUE)
  rep_len <- ceiling((nrow(t_dat) - 3)/6)
  t_group <- c(rep(1,3),rep(2:rep_len,each = 6))
  t_group <- data.frame(group = t_group[1:nrow(t_dat)])
  t_dat <- cbind(t_dat,t_group) %>% filter(!is.na(group))
}

timelineData <- dplyr::bind_rows(timelineData) %>% 
  dplyr::group_by(deployid,group) %>% 
  dplyr::summarise(percent_dry = mean(percent_dry,na.rm = TRUE),
            date_time = ceiling_date(median(date_time),"hour")) %>% 
  dplyr::select(deployid,date_time,percent_dry) %>% 
  dplyr::filter(!is.nan(percent_dry))

predData <- dplyr::left_join(predData,
                              timelineData,
                              by = c("predTimes" = "date_time", 
                                     "deployid")) %>%
  tbl_df() %>%
  dplyr::select(deployid,species,predTimes,mu.x,mu.y,percent_dry) %>% 
  dplyr::arrange(deployid, predTimes) %>% 
  dplyr::rename(date_time = predTimes)

diveData <- depths %>% 
  dplyr::filter(bin != "bin1") %>% 
  dplyr::mutate(date_time = date_time + hours(3)) %>% 
  dplyr::group_by(deployid,date_time) %>% 
  dplyr::summarise(num_dives = sum(num_dives,na.rm = TRUE)) %>% 
  ungroup()

predData <- dplyr::left_join(predData,diveData) %>% 
  dplyr::arrange(deployid,date_time)

predData <- predData %>% 
  dplyr::group_by(deployid) %>% 
  dplyr::mutate(x_disp = order_by(date_time, mu.x - lag(mu.x)), 
         y_disp = order_by(date_time, mu.y - lag(mu.y))
  ) %>% 
  dplyr::filter(!is.na(x_disp) | !is.na(y_disp)) %>% 
  ungroup()

anglefun <- function(xx,yy,bearing = TRUE, as.deg = FALSE){
  ## calculates the compass bearing of the line between two points
  ## xx and yy are the differences in x and y coordinates between two points
  ## Options:
  ## bearing = FALSE returns +/- pi instead of 0:2*pi
  ## as.deg = TRUE returns degrees instead of radians
  c = 1
  if (as.deg) {
    c = 180/pi
  }

  b <- sign(xx)
  b[b == 0] <- 1  #corrects for the fact that sign(0) == 0
  tempangle = b*(yy < 0)*pi + atan(xx/yy)
  if (bearing) {
    #return a compass bearing 0 to 2pi
    #if bearing==FALSE then a heading (+/- pi) is returned
    tempangle[tempangle < 0] <- tempangle[tempangle < 0] + 2*pi
  }
  return(tempangle*c)
}

predData <- predData %>% rowwise() %>% 
  dplyr::mutate(bearing = anglefun(x_disp,y_disp,as.deg = TRUE)) %>% 
  ungroup()

id_apr1 <- predData %>% 
  dplyr::group_by(deployid) %>% 
  dplyr::summarise(start_yr = year(min(date_time))) %>% 
  dplyr::mutate(apr1 = lubridate::ymd(paste(start_yr,"04-01",sep = "-"))) %>% 
  dplyr::select(-start_yr) %>% ungroup()

predData <- predData %>% 
  dplyr::left_join(id_apr1) %>% 
  dplyr::mutate(days_since_apr1 = as.numeric(difftime(date_time,apr1,units = "days")))

predData <- predData %>% 
  dplyr::mutate(ho_binary = ifelse(percent_dry >= 33.3,1,-1),
         y_disp = ifelse(abs(y_disp) > 100000,NA,y_disp),
         x_disp = ifelse(abs(x_disp) > 100000,NA,x_disp),
         num_dives = ifelse(num_dives > 150,NA,num_dives)) 

load('pep_age.rda')

predData <- predData %>% 
  dplyr::left_join(pep_age)

save(predData,file = 'predData.rda')

#data(alaska_dcw)
#data(russia_dcw)
#data(canada_dcw)

# map_data <- readOGR('map-data/',
#                     'ne_10m_admin_1_states_provinces')
# 
# ak<-spTransform(subset(map_data,name=='Alaska'),CRS("+init=epsg:3571"))
# rus<-spTransform(subset(map_data,admin=='Russia'),
#                  CRS("+init=epsg:3571"))
# can<-spTransform(subset(map_data,admin=='Canada'),CRS("+init=epsg:3571"))
# 
# 
# ak<-ggplot2::fortify(ak)
# rus<-ggplot2::fortify(rus)
# can<-ggplot2::fortify(can)

library(maptools)

# we don't want to store this map data in git, but also want to make sure # Travis can get it
if (!file.exists("map-data/gshhg-bin-2.3.5")) {
  tmp <- tempfile(fileext = ".zip")
  download.file("ftp://ftp.soest.hawaii.edu/gshhg/gshhg-bin-2.3.5.zip", tmp)
  unzip(tmp, exdir = "map-data/gshhg-bin-2.3.5")
  unlink(tmp)
}

xlims <- c(180-50,180+50)
ylims <- c(40,85)

npac.h <- "map-data/gshhg-bin-2.3.5/gshhs_h.b"
npac <- Rgshhs(npac.h, xlim=xlims, ylim=ylims, 
               level=1, checkPolygons = TRUE, shift=TRUE)

npac <- spTransform(npac$SP,CRS("+init=epsg:3571"))
npac.fort <- fortify(npac)

xlim<-range(predData$mu.x)
ylim<-range(predData$mu.y)

p <- ggplot() + 
  geom_polygon(data=npac.fort,
               aes(x=long,y=lat,group=group,id=id),
               fill="grey60") +
  geom_point(data=predData,aes(x=mu.x,y=mu.y, color=deployid),
             size=0.65,alpha=0.3) +
  coord_fixed(xlim=xlim*1.1, ylim=ylim) + 
  facet_wrap("deployid",ncol=7) +
  theme_fivethirtyeight() + scale_color_ptol(guide=FALSE) +
  theme(axis.text = element_blank(),
        strip.background = element_blank(),
       strip.text.x = element_blank()) +
  ggtitle(paste0("Predicted Movements of 7 Adult Bearded Seals\n",
                 "in the Bering and Chukchi Seas (2009 - 2013)"))
p
#ggsave("pres_data/fig-1.png",height=5,width=8,units="in")

This series of five plots shows the raw, 'observed' parameter values that will go into the model: haul-out status, number of dives, y-displacement, and x-displacement.

fig_nums(name = "plot-1",
         caption = "observed haul-out status parameter values from bio-logging devices deployed on seven adult bearded seals")
fig_nums(name = "plot-2",
         caption = "observed number of dives below 10 meters reported from bio-logging devices deployed on seven adult bearded seals")
fig_nums(name = "plot-3",
         caption = "estimated y-displacement per six hours based on movements of seven adult bearded seals")
fig_nums(name = "plot-4",
         caption = "estimated x-displacement per six hours based on movements of seven adult bearded seals")
fig_nums(name = "plot-5",
         caption = "comparison of AIC values from the mhsmm model fit for each deployment and varying numbers of states (3:8)")
fig_nums(name = "plot-6",
         caption = "comparison of AIC values from the mhsmm model fit summed across deployments for varying numbers of states (3:8)")

p1 <- ggplot(predData, aes(x = days_since_apr1, y = ho_binary)) + 
  geom_point(alpha = 0.2,size = 3,aes(colour = factor(ho_binary))) + 
  facet_grid(deployid ~ .) + 
  ggtitle("haul-out status (bearded seal)") +
  theme_fivethirtyeight() + scale_colour_ptol()
p1

r fig_nums("plot-1")

p2 <- ggplot(predData, aes(x = days_since_apr1, y = num_dives)) + 
  geom_point(alpha = 0.2) + facet_grid(deployid ~ .) + 
  ggtitle("number of dives below 10m (bearded seal)") +
  theme_fivethirtyeight() + scale_colour_ptol()
p2

r fig_nums("plot-2")

predData %>% dplyr::mutate(direction = ifelse(
  y_disp >= 0, "north","south")) %>% 

ggplot(aes(x = days_since_apr1, y = y_disp/1000)) +
  geom_bar(aes(color = factor(direction)),alpha = 0.2,stat = "identity") +
  facet_grid(deployid ~ .) + 
  ggtitle("y displacement (bearded seal)") +
  theme_fivethirtyeight() + scale_colour_ptol()

r fig_nums("plot-3")

predData %>% dplyr::mutate(direction = ifelse(
  y_disp >= 0, "west","east")) %>% 

ggplot(aes(x = days_since_apr1, y = x_disp/1000)) +
  geom_bar(aes(color = factor(direction)),alpha = 0.2,stat = "identity") +
  facet_grid(deployid ~ .) + 
  ggtitle("x displacement (bearded seal)") +
  theme_fivethirtyeight() + scale_colour_ptol()

r fig_nums("plot-4")

Multivariate Hidden semi-Markov Model

The R package mhsmm provides a framework for multi-variate hidden semi-Markov models. The package provides support for a few selected emmission distributions (Poisson, normal, and multi-variate normal) and is extentable to additional distributions. For dive behavior (i.e., number of dives below 10m), a Poisson emmission distribution was used and a multi-variate normal was used for the combined x-y displacement. A Bernouli distribution was used for haul-out status. Custom M-step functions were written to extend the package in support of the Bernouli as well as provide support for missing values in the Poisson and multi-variate normal.

library(mhsmm)

dd <- predData

dd$ind_dry = ifelse(dd$percent_dry > 100/3, 1, 0)
dd$y_disp_km = dd$y_disp/1000
dd$x_disp_km = dd$x_disp/1000

### MHSMM functions
source("mhsmm_functions.R")

In order to mimic the seasonal progression, we fixed the transition matrix with all superdiagonal entries set to 1 and the remaining entries set to 0. The transition matrix was not be re-estimated and the initial state was set to state 1. Initial values for the other parameters were generally set to a reasonable value and repeated across all states (table 1). A shifted Poisson distribution was chosen for the initial sojourn values and the lambda value was set to the number of time steps divided by the number of states. This initiates the model with a constant sojourn value distributed evenly across the deployment. The shift parameter was set to 0 for all states.

n_states = 5

init_vals <- data.frame(
  state = c(1:n_states),
  haul_out = rep(0.1,n_states),
  n_dives = rep(30, n_states),
  xy_disp = rep("0,0",n_states),
  S_type = rep("poisson",n_states),
  S_lambda = rep(paste0("nrows/",n_states),
                               n_states),
  S_shift = rep(0,n_states),
  stringsAsFactors = FALSE
)

mshmm_telem_fit <- function(dd, J) {
  nrows <- nrow(dd)
  # Initial vals
  init0 = c(1, rep(0, J - 1))
  B0 = list(
  p = rep(0.1,J),
  lambda = rep(30, J),
  mu = rep(list(c(0, 0)), J),
  sigma = rep(list(20 * diag(2)), J)
  )
  P0 <- matrix(0, ncol = J, nrow = J)
  n <- 2
  n1 <- n:ncol(P0)
  P0[seq(n1), n1] <- diag(ncol(P0) - n + 1)
  P0
  # S0 = list(shape = rep(12, J),
  # scale = rep(6, J),
  # type = "gamma")
  S0 = list(
    lambda = c(rep(nrows/(J),J)),
    shift = rep(0,J),
    type = "poisson"
  )

  start_val = hsmmspec(
  init = init0,
  transition = P0,
  parms.emission = B0,
  sojourn = S0,
  dens.emission = dtelem.hsmm.mv,
  mstep = mstep.telem.mv
  )

  # Fit model
  data = list(x = with(dd,
  as.matrix(dd[, c("ind_dry",
  "num_dives",
  "x_disp_km",
  "y_disp_km")])),
  N = table(dd$deployid))
  #M <- 600
  fit = hsmmfit(data,
  start_val,
  mstep = mstep.telem.mv,
  lock.transition = TRUE)

  pO <- 7
  pH <- 2
  npar <- (pO + pH) * J

  loglik_last <- tail(fit$loglik, n = 1)

  fit_AIC = -2 * loglik_last + 2 * npar

  mhsmm_telem_fit <- list(
    id = dd$deployid[1],
    data = data,
    fit = fit,
    nstates = J,
    npar = npar,
    loglik = fit$loglik,
    aic = fit_AIC
  )
  return(mhsmm_telem_fit)
}

fit_list <- list()
for (id in unique(dd$deployid)) {
    this_data <- dd %>% dplyr::filter(deployid == id) %>% 
    dplyr::arrange(deployid, date_time)
    message(id)
  for (J in 3:8) {
    fit_list[[paste("n_states",J,sep = "_")]][[id]] <- mshmm_telem_fit(dd = this_data,J)
    message(J)
    message(tail(fit_list[[paste("n_states",J,sep = "_")]][[id]]$loglik,
                 n = 5))
  }
}

Model Selection

The key, unknown parameter in this analysis is the number of seasonal states. In an ideal situation, we would have some a priori knowledge regarding number of seasonal states and our modeling effort would focus on the timing and duration of those known number of states. For some, well-studied species, this might be possible. Even for less studied species like bearded seals, we know their seasonal cycle includes a period for pupping and breeding in approximately April/May, followed by a molt period that can extend into July. The remainder of the year is less understood and, in the case of this study, the bio-logger deployments mostly cover this less-known period.

We used AIC as a possible indicator of an optimal number of seasonal states. The number of states was varied from three to eight and AIC was evaluated for each individual deployment as well as a sum across all deployments.

extract_AIC_vals <- function(fit_list) {
  fit_data <- vector("list")
  for (i in 1:length(fit_list)) {
    for (id in names(fit_list[[i]])) {
      fit_data[[paste0(i + 2,id)]] <- dplyr::data_frame(
      id = id,
      states = i + 2,
      aic = fit_list[[i]][[id]]$aic
      )
    }
  }
  fit_data <- dplyr::bind_rows(fit_data)
  return(fit_data)
}

extract_sum_AIC <- function(fit_list) {
  sum_aics <- vector("numeric",length = length(fit_list))
  for (i in 1:length(fit_list)) {
    s <- 0
    for (id in names(fit_list[[i]])) {
      if (length(fit_list[[i]][[id]]$aic) == 0) {
        message("No fit for ",id," ",i + 2," states")
        next
      }
      s <- s + fit_list[[i]][[id]]$aic
    }
    sum_aics[i] <- s
  }
  return(sum_aics)
}

# extract_sum_AIC(fit_list)
aic_vals <- extract_AIC_vals(fit_list)
aic_vals$states <- as.factor(aic_vals$states)
aic_vals$id <- as.factor(aic_vals$id)
# aic_vals <- aic_vals %>% mutate(states = as.factor(states),
#                                 id = as.factor(id))

ggplot(data = aic_vals,aes(y=aic,x=states,colour=id)) + 
  geom_point() + theme_fivethirtyeight() + scale_color_ptol() +
  ggtitle("AIC Values for Each Deployment Across 3:8 Seasonal States")

r fig_nums("plot-5")

aic_vals %>% 
  group_by(states) %>% summarize(sum_aic = sum(aic)) %>% 
ggplot(aes(y=sum_aic,x=states)) + 
  geom_point() + theme_fivethirtyeight() + scale_color_ptol() +
  ggtitle("Summed AIC Values Across 3:8 Seasonal States")

r fig_nums("plot-6")

Relying on the summed AIC values alone would suggest eight (or more) seasonal states. However, with that many states it becomes difficult to assign ecological and behavioral meaning to those states. There are also valid concerns regarding the applicability of AIC as a model selection tool for hidden-Markov models as well as a tendency towards over-fitting.

In the end, a combination of ecological intuition, some knowledge of bearded seal behavior and ecology, and the pattern in AIC values led us to settle on 5 seasonal states. This seems reasonable given our understanding of bearded seal ecology and exploration of additional states resulted in less descrimination when examined against the underlying data.

Model Fitting and State Prediction

pander(init_vals)

fit_predict = list()
for (id in unique(dd$deployid)) {
  this_fit <- fit_list[[paste0("n_states_",n_states)]][[id]]$fit
  this_data <- fit_list[[paste0("n_states_",n_states)]][[id]]$data
  fit_predict[[id]] <- data_frame(
    state = predict(this_fit,this_data)$s
  )
}
fit_predict <- dplyr::bind_rows(fit_predict)
fit_predict <- dplyr::bind_cols(dd,fit_predict) %>% 
  dplyr::mutate(state = as.factor(state))

Adult Bearded Seals

We can plot the state assignments for each bearded seal

library(ggplot2)
ggplot(data = fit_predict,aes(x=days_since_apr1,y = "")) + 
  geom_tile(aes(fill = state)) + 
  facet_wrap(~deployid, ncol = 1) + 
   theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.y = element_blank()) +
  ggtitle("Seasonal State Assignments for 7 Adult Bearded Seals") +
  theme_fivethirtyeight() + scale_fill_ptol() +
  scale_x_continuous(breaks=c(91,197,306),
                     labels=c("01 July","15 Oct","01 Feb")) +
  theme(legend.position = "none")
#ggsave("pres_data/fig-7.png",width=8,height=6,units="in")

We can also combine the state assignments across our tagged seals by taking the majority state for each time step. The color transparency is set to the proportion of animals represented by that majority state. Note that, some states may not be present in the combined graph.

combo <- fit_predict %>% 
  dplyr::group_by(days_since_apr1,state) %>% 
  dplyr::summarise(counter = n()) %>% 
  dplyr::group_by(days_since_apr1) %>% 
  dplyr::filter(counter == max(counter))

p <- ggplot(data = combo,aes(x = days_since_apr1,y = " ")) + 
  geom_tile(aes(fill = state,alpha = counter/7)) +
  scale_alpha(guide = 'none') + theme_fivethirtyeight() +
   theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.y = element_blank()) +
  ggtitle("Combined Seasonal State Assignments") + scale_fill_ptol(limits = levels(fit_predict$state),drop = TRUE) +
  scale_x_continuous(breaks=c(91,197,306),
                     labels=c("01 July","15 Oct","01 Feb")) +
  theme(legend.position = "none")
p
p <- p + theme(plot.title = element_blank())
#ggsave("pres_data/fig-8.png",width=8,height=2,units="in")

Now that we have some seasonal states assigned, we can examine the distribution of various behaviors across those different states.

ggplot(data = fit_predict %>% filter(percent_dry > 3),aes(x = percent_dry)) +
  geom_density(aes(fill = state),size = 0) + facet_wrap(~state) +
  ggtitle("Distribution of Percent Dry Across Seasonal States") +
  theme_fivethirtyeight() + scale_fill_ptol() + theme(legend.position="none")
#ggsave("pres_data/fig-9.png",width=8,height=4,units="in")

ggplot(data = fit_predict %>% filter(num_dives < 150),aes(x = num_dives)) +
  geom_density(aes(fill = state),size = 0) + facet_wrap(~state) +
  ggtitle("Distribution of Number of Dives Across Seasonal States") +
  theme_fivethirtyeight() + scale_fill_ptol() + theme(legend.position="none")
#ggsave("pres_data/fig-10.png",width=8,height=4,units="in")

ggplot(data = fit_predict) +
  geom_histogram(aes(x = bearing,y = ..density..,fill = state),binwidth = 12) +
  coord_polar() + 
  scale_x_continuous(limits = c(0,360),breaks = seq(0,180, by = 45)) +
  facet_wrap(~state,ncol = 5) +
  ggtitle(paste0("Distribution of Movement and\n",
                "Compass Bearing Across Seasonal States")) + 
  theme_fivethirtyeight() + scale_fill_ptol() + theme(legend.position="none",
      axis.text.y=element_blank())
#ggsave("pres_data/fig-11.png",width=8,height=4,units="in")

xlim<-range(fit_predict$mu.x)
ylim<-range(fit_predict$mu.y)

p <- ggplot() + 
  geom_polygon(data=npac.fort,
               aes(x=long,y=lat,group=group,id=id),
               fill="grey60") +
  geom_point(data=fit_predict,aes(x=mu.x,y=mu.y,colour=state),size=1.75,alpha=0.4) +
  coord_fixed(xlim=xlim*1.1, ylim=ylim) + 
  facet_wrap(~state,nrow=1) +
  theme_fivethirtyeight() + theme(axis.text.y=element_text(angle=90)) +
  ggtitle("Predicted Movements of Bearded Seals\n by Assigned Seasonal State")  + scale_color_ptol() +
  theme(legend.position = "none", axis.text= element_blank())
p
#ggsave("pres_data/fig-12.png",width=8,height=4,units="in")