In ctmm-initiative/ctmm-webapp: Web App For ctmm

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=6, 
  fig.height=6
)

Introduction

This document demonstrates the usage of package functions in the analysis of animal tracking data. As a complete analysis consists of a series of sequential steps, it is more informative to show how the package's functions fit into this workflow than giving separate code examples in function help files.

First we load the libraries and prepare the data.

Note When importing your own data, ctmm::as.telemetry by default will return single telemetry object with single animal, and a list of telemetry objects with multiple animals. To make code consistent we always work with a list of telemetry objects. You can use drop = FALSE parameter to make sure it's a proper list.

library(ctmm)
library(ctmmweb)
library(magrittr)

data(buffalo)

# for importing data, use drop = FALSE to make sure it's a proper list.
# data <- as_telemetry(data_file, drop = FALSE)

# take a 100 point sample from each animal to speed up model fitting etc
data_sample <- pick(buffalo, 100)

# somehow a line like this can avoid the `Assertion failure at kmp_runtime.cpp(6480): __kmp_thread_pool == __null.` error in knitting the vignette.
# this is not needed anymore with R 3.5
# par_lapply(1:2, sqrt)

Basic data structure

To plot the locations of multiple animals simultaneously with ggplot2, we need to merge all location data into a single data.frame. merge_tele will merge ctmm telemetry objects/lists into a list of two data.tables: data for animal location, info for some summaries of the data.

data.table has much better performance than data.frame though uses a different syntax. You can still use data.frame syntax on data.table if you are not familiar with it.

# Error will occur if single telemetry object was provided. See ?as_tele_list and ?report for details.
collected_data <- collect(buffalo)
# a list of locations data.table/data.frame and information table
loc_data <- collected_data$data
info <- collected_data$info

In loc_data:

identity are animal names
id are animal names as a factor. We will need this when we want to maintain the level information.

knitr::kable(head(loc_data))

You can calculate distance and speed outliers in the data via:

# To save memory, data.table modify reference by default. The incoming data.table was modified and distance_center column is added after calculation. Note the telemetry list is needed for error information
assign_distance(loc_data, buffalo)
knitr::kable(head(loc_data))
# always calculate distance first then calculate speed outlier with distance columns added
assign_speed(loc_data, buffalo)
knitr::kable(head(loc_data))
# Or you can use pipe operator
loc_data %>% assign_distance(buffalo) %>% assign_speed(buffalo)
knitr::kable(head(loc_data))

info summarize some basic information on data.

knitr::kable(info)

Selecting a subset

In the app we can select a subset of the full data by slecting rows in the data summary table.

To perform the same selection via package functions, we can select animal names from the identity column or numbers from the id factor column in loc_data using either data.table or data.frame syntax.

We suggest to always select a subset from full data instead of creating separate data objects for individuals, because the subset will carry the id column which still holds all animal names as levels so that a consistent color mapping can be maintained (otherwise ggplot2 will always draw the first animal in same color).

# select by identity column
loc_data_sub1 <- loc_data[identity %in% c("Gabs", "Queen")]
# select by id factor column value
loc_data[as.numeric(id) %in% c(1, 3)]

Visualization

You can reproduce most of the plots in the Visualization page of the webapp with package functions, for example:

# plot animal locations
plot_loc(loc_data)
# plot a subset only. Note the color mapping is consistent because loc_data_sub1 id column hold all animal names in levels.
plot_loc(loc_data_sub1)
# with subset and full data set both provided, subset will be drawn with full data as background. 
plot_loc(loc_data_sub1, loc_data)
# location in facet
plot_loc_facet(loc_data_sub1)
# sampling time
plot_time(loc_data_sub1)
# take the ggplot2 object to further customize it
plot_loc(loc_data_sub1, loc_data) +
  ggplot2::ggtitle("Locations of Buffalos") +
  # override the default left alignment of title and make it bigger
  ctmmweb:::CENTER_TITLE
# export plot
g <- plot_loc(loc_data_sub1, loc_data)
# use this to save as file if needed
if (interactive()) {
  ggplot2::ggsave("test.png", g)
}

Variogram

We can plot both empirical variograms alone and empirical variograms with fitted theoretical semi-variance functions superimposed on them.

For example, to plot empirical variograms from telemetry data, one would do:

vario_list <- lapply(data_sample, ctmm::variogram)
# names of vario_list are needed for figure titles
names(vario_list) <- names(data_sample)
plot_vario(vario_list)
# sometimes the default figure settings doesn't work in some systems, you can clear the plot device then use a smaller title size
# dev.off()
# plot_vario(vario_list, cex = 0.55)

Similarly, we can compare initial movement model parameter guesses to the data via variograms:

guess_list <- lapply(data_sample,
                     function(tele) 
                       ctmm::ctmm.guess(tele, interactive = FALSE))
plot_vario(vario_list, guess_list)

Model summary table

We can try different models on multiple animals in parallel.

Note there could be some known errors with data.table, R 3.4 and parallel operations. You can try to restart R session if met with same error.

# try multiple models in parallel with ctmm.select. parallel mode can be turned off with parallel = FALSE
model_try_res <- par_try_models(data_sample)
# `model_try_res` holds a list of items named by animal names. Each item hold the attempted models for that animal as a sub list, named by model type.
print(str(model_try_res[1:3], max.level = 2))
# a data.table of models information summary
model_summary <- summary_tried_models(model_try_res)
# you can also open it with RStudio's data.frame viewer
knitr::kable(model_summary)

There could be multiple models attempted for each animal. In the webapp you can select a subset of models then check their variograms, home ranges and occurrences. You can also select a subset of fitted models via in a script as:

To make selecting a subset easier, we first convert this nested list structure to a flat list:

# the nested structure of model fit result
names(model_try_res)
names(model_try_res[[1]])
# convert to a flat list
model_list <- flatten_models(model_try_res)
names(model_list)

Then we can find the model names in the model summary table by model_no or model_type. The code here uses data.table syntax, but you can also use the table as data.frame if you want.

# select subset in model summary table by model_no
knitr::kable(model_summary[model_no %in% c(1, 3, 10, 11, 12, 13)])
# select by model type
knitr::kable(model_summary[model_type == "OU anisotropic"])
# select first(best) model for each animal using the smallest AICc value
knitr::kable(model_summary[dAICc == 0])
# The expression is enclosed with () to enable automatical printing of result. Both model_name and identity are selected in same filter. We need the model_name to filter the model list, and the animal name to filter the variograms
(names_sub2 <- model_summary[(model_no %in% c(1, 3, 10, 11, 12, 13)),
                             .(model_name, identity)])

Once you have selected the models in summary table, you can filter the actual models list with model names. Note this is a different subset from loc_data_sub1 above.

# filter model list by model names to get subset of model list.
model_list_sub2 <- model_list[names_sub2$model_name]

Now we can plot variograms with the fitted SVFs of the selected models. Note the vario_list_sub2 needs to match with model_list_sub2 in length and animal name, so they are based on same data.

# get corresponding variograms by animal names.
vario_list_sub2 <- vario_list[names_sub2$identity]
# specify a different color for model
plot_vario(vario_list_sub2, model_list_sub2, model_color = "purple")

Home range

We can estimate home range with the telemetry data and the fitted models. Note parallel mode is not used because we want to calculate all animals together to put them in same grid.

# calculate home range with ctmm::akde. 
tele_list_sub2 <- data_sample[names_sub2$identity]
hrange_list_sub2 <- akde(tele_list_sub2, CTMM = model_list_sub2)
# name by model name
names(hrange_list_sub2) <- names(model_list_sub2)
# summary of each home range. There is no summary table function here because we borrowed the model table in app to make the home range summay table. To reproduce that in functions need model table/model_try_res and the selection as parameters, which will be quite awkward. If there is a strong request from users, a summary table function can be added.
lapply(hrange_list_sub2, summary)
# plot home range
plot_ud(hrange_list_sub2)
# plot home range with location overlay
plot_ud(hrange_list_sub2, tele_list = tele_list_sub2)
# plot with different level.UD values
plot_ud(hrange_list_sub2, level_vec = c(0.50, 0.95), tele_list = tele_list_sub2)

Batch script for home range

Home range is one of the most used feature in ctmm/ctmmweb. Sometimes you have lots of individuals and don't need to fine tune each model, then you can calculate home range in batch mode. ctmmweb package provided many functions to make this process as easy as possible. Check the function documents for more details.

The script can also be used to create a reproducible example if you met any error in modeling/home range parts.

library(ctmm)
library(ctmmweb)
library(data.table)

# this read from a .rds from saved progress file. You can also import your data directly with as.telemetry
tele_list <- readRDS("/Users/xhdong/Downloads/Saved_2019-11-15_11-37-41_models/input_telemetry.rds")
# one line to try models. The result is a nested list of CTMM models
system.time(
  model_try_res <- par_try_models(tele_list)
)
save.image("test.RData")
# summary models to find the best
model_summary <- summary_tried_models(model_try_res)
# select the best model name for each individual, which is the first one with 0 AICc
best_model_names <- model_summary[, .(best_model_name = model_name[1]), 
                                  by = "identity"]$best_model_name
# get a flat list of model object
model_list <- flatten_models(model_try_res)
# get the best model objects
best_models <- model_list[best_model_names]
# calculate home range in same grid, with optimal weighting on
hrange_list_same_grid <- akde(tele_list, best_models, 
                              weights = TRUE)
# calculate home range separately, with optimal weighting on
hrange_list <- par_hrange_each(tele_list, best_models,
                               rep.int(list(TRUE), length(tele_list)))

Occurrence

Occurrence distribution can be calculated from the telemetry data and the fitted models in parallel.

# calculate occurrence in parallel. parallel can be disabled with fallback = TRUE
occur_list_sub2 <- par_occur(tele_list_sub2, model_list_sub2)
# plot occurrence. Note tele_list is not needed here because the location overlay usually interfere with occurrence plot.
plot_ud(occur_list_sub2, level_vec = c(0.50, 0.95))

Maps

We can then build interactive online maps of the data and fitted Home Range and Occurrence distributions. For example, one can display the location data on a map via:

# loc_data_sub1 is used for visualization plot. all the model selections above are based on a different subset. We need to get corresponding loc_data subset for model selections
loc_data_sub2 <- loc_data[identity %in% names(tele_list_sub2)]
point_map(loc_data_sub2)
point_map(loc_data)

The map can be saved as a self contained html file.

# save map to single html file. file can only be a filename, not a relative path
if(interactive()) {
  htmlwidgets::saveWidget(point_map(loc_data_sub2), file = "point_map.html")
}

Displaying home range estimates requires more input. See their help documents for detailed explanations.

# A map with home range estimates need: list of home range UD object, vector of level.UD in ctmm::plot.telemetry, a vector of color names for each home range estimate.
range_map(hrange_list_sub2, 0.95, rainbow(length(hrange_list_sub2)))
# To overlay home range estimates with animal locations, the corresponding animal location data is needed.
point_range_map(loc_data_sub2, hrange_list_sub2, 0.95,
                rainbow(length(hrange_list_sub2)))