knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The goal of the bustR (Bottom phase identification for Underwater Serially Tracked animals) package is to provide a group of useful functions to work with time-depth record data (TDR data) for marine mammals that exhibit diving behaviour, and allow for a user to define the bottom phase of a set of dives. The main feature of this package is an interactive Shiny app that allows users to manually classify the bottom phase of their dives (which inherently defines the descent and ascent phases as well).
There are several other R packages which provide a plethora of dive manipulation functions, such as diveMove which is available on CRAN and tagtools which is available on Github. However, the purpose of bustR is not to extensively manipulate and/or analyze diving data, rather it is a tool that provides the user the ability to classify dive phase definitions.
This vignette will provide a walkthrough of how to use the bustR package, specifically the Shiny App. The App will produce a dataset that can then be used in the modelling stage to create a model for the bottom phase of dives.
General flow of how this package runs:
Before we begin, we need to load the bustR package, as well as the tidyverse
(v. 1.3.0) package for ease of plotting.
devtools::load_all("~/Desktop/bustR") library(tidyverse)
The first step is to read in the TDR data. This step must be done by the user themselves as there are no built in functions in this package to read in the diving data. The TDR data should be read in as a dataframe, preferably with a function such as vroom::vroom()
or readr::read_csv()
. Moreover, it is expected that this data is already cleaned and ready to be analyzed. For example, the time-depth data should already be smoothed, if necessary.
The TDR dataframe must:
Have 3 distinct columns, named: $\texttt{time}$, $\texttt{depth}$, and $\texttt{dive_num}$.
The $\texttt{time}$ variable must be a distinct number for every row in your dataframe, and it should identify the timestamp correctly. $\texttt{time}$ should be a strictly increasing number as well. When in doubt, it may be easier for this value to just be the numbers $1, ..., N$, where $N$ is the total number of rows in this dataframe.
The $\texttt{depth}$ variable is a measure of the depth of the animal at each specific time frame. $\texttt{depth}$ should be negative.
Finally, the $\texttt{dive_num}$ variable should be a non-negative integer representing which dive the depth data corresponds to. Given an entire TDR sequence, the data can be split into individual dives, depending on several parameters such as a minimum depth requirement, and a surface level requirement. For the parts of the sequence which are not dives, please leave $\texttt{dive_num}$ as a zero (0).
Be cleaned and prepared. If the data have not been smoothed, for example, then the bottom phase classification will not be straightforward as there will be a lot of noise in the data. Consider cleaning and preparing the dataframe before doing any bottom phase classifications.
For example, consider the sample dataframe provided in this package, which can be accessed as follows:
# Access the sample diving data. whale <- system.file("extdata", "whale_example.csv", package = "bustR") whale_data <- vroom::vroom(whale)
From this dataframe (or tibble), we can look at the structure to see what is required:
head(whale_data)
And we can also take a look at the summary of these three columns:
summary(whale_data)
We can also see the total number of dives which exist in this dataset:
sort(unique(whale_data$dive_num))
From here, we see 9 unique dives. Notice that we also have zeros, which indicate that the whale is at the surface and is not currently engaged in a dive. It should be noted that dives are only considered true dives if they exceed 3 meters in depth. As such, any dives which are not below 3 meters and not constituted as dives. Plotting this TDR data, and making the surface portions in red, and labelling the true dives, we get the following:
ggplot(whale_data, aes(x = time, y = depth, color = factor(dive_num))) + geom_path(aes(group = 1))
From the above plot, all the lines in red are considered surface level behaviour and are not considered dives. The remaining dives are labelled from 1 to 9 as shown.
The next step is to run the ShinyApp and label the bottom phase of the dives. This is a critical aspect of this R package. To run this app, we call the function runBottomPhaseApp()
. You must provide several arguments to this function.
entire_record
is the full TDR data that was previously read in.
file_path
(string) is the path to a folder or location on your computer where you want your data to be saved. Note that this folder must already exist for this to work and there must be nothing in this folder except possibly previously saved ShinyApp data from an earlier session (more on that later).
init_num
(positive integer) is how many dives you want to view during this session. The default is init_num = 10
. This value cannot be larger than the total number of dives in the dataset.
rand_select
(boolean) specifies if you want the dives to be presented to you in a random order. If this parameter is FALSE
, then you will see the dives starting from the beginning of your dataset.
continuing
(boolean) is a special parameter that allows you to "pick up where you left off" so to speak. If this is the first time you are running the app, then you should set continuing = FALSE
(this is the default). However, suppose you labelled 50 dives yesterday, and you want to label more today. Notice that you don't want to accidentally label those previous 50 dives again since that would be a waste of time. In this case, you should set continuing = TRUE
. This will avoid the previously saved data in that directory. Notice: if you accidentally set continuing = FALSE
, but data already exists in that directory, you will get a warning message telling you your previous data will be overwritten. Similarly, if continuing = TRUE
, but no data exist in the folder, you will get a warning letting you know.
custom_dives
(vector of positive integers) is a vector of integers of the dives you want to label, which must be numbers that exist in the TDR dive_num
column.
launch_browser
(boolean) will launch the ShinyApp in your default browser if launch_browser = TRUE
.
As an example, to run the Shiny App, we use the runBottomPhaseApp
as follows:
bustR::runBottomPhaseApp(entire_record = whale_data, file_path = "~/Desktop/my_whale_data", init_num = 5, rand_select = FALSE, continuing = FALSE, launch_browser = TRUE)
You should try this function out and play around with the parameters using this toy dataset to get a feel for how things work.
After submitting information for each dive, the information is saved in the file_path
directory where each dive is an individual csv file, named appropriately. You can close the app at any point, and all previously labelled dives will be saved. Once you are done with the Shiny App, you now have a folder containing the bottom phase information for each dive in your dataset. You can now use this information to build models for the bottom phase of the dive. More extensive modelling can be found in my Master's thesis, or in my paper (as soon as it is published).
At this point, you will have a directory on your computer containing a bunch of csv files with your data stored in them. To access this data, we will use the results()
function. This function requires two parameters: f
, the full_data dataframe that we read in, and outdir
which is the path to where all those csv files are located.
Moreover, we can call summary
on the output of the results
function to produce several sumamry statistics about this data. This function is run as follows:
r <- results(whale_data, "~/Desktop/my_whale_data") summary(r)
And from this results function, we can also look at the bottom phase data that was provided by the user. This is going to be the dataframe that we are most interested in since we can use this in the modelling stage.
u <- r$bottom_phase_data u
A simple bottom phase definition that is used in the literature is "percent of maximum depth" wherein the bottom phase is defined as the time between the first and last occurrence when the animal reaches X% of its maximum depth. Normally, X is 70% or 80%. A simple extension to this method is to estimate X from the data. The function percent_max_depth
does exactly this. Using the output from the Shiny App (from the results function), we can easily estimate this percentage threshold.
percent_max_depth
takes two parameters: full_data
, which is the complete dataset that we read in earlier; and responses
which is the dataframe that we get from the results
function. We implement this function as follows:
percent_max_depth(whale_data, u)
We get output as follows:
# > [1] "The percentage threshold (of max depth) for the start: 68.1%" # > [2] "The percentage threshold (of max depth) for the end: 65%"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.