source("setup.R")

This section will get you started with transforming and summarizing your data from raw read data to a variety of different data types which can be used to answer different questions.

Here we assume that you know how to load/import your data and that, if needed, you've cleaned up problematic animal_ids with the various housekeeping functions.

Note that for several examples of summarizing, we'll be using functions from the tidyverse including the magrittr pipe %>%, which essentially 'pipes' the output of the first line to the next line, see this intro to the pipe for more details.

library(tidyverse)
library(lubridate)

Functions

In all cases you'll want to use the visits() function first, but after that it depends on what kinda of data you want to end up with.

  1. visits() - Visits to a logger
  2. move() - Movements between loggers
  3. presence() - Presence around a logger
  4. disp() - Displacement from a logger
  5. activity() - Activity patterns
  6. daily() - Daily activity patterns
  7. Conversion functions - Functions to convert data for use by other packages
  8. Working with multiple groups - Applying transformations to different data groups simultaneously

Or click on the function name below to skip directly to the function tutorial:

DiagrammeR::grViz("
digraph order {

graph [overlap = true, compound = true]

node[shape = Mrecord]
edge[arrowhead = vee]

visits[label = 'visits()', URL = '#visits', tooltip = 'Individual logger visits']
disp[label = 'disp()', URL = '#displacements', tooltip = 'Displacements from the logger by another animal']
move[label = 'move()', URL = '#movements', tooltip = 'Movements between loggers']
presence[label = 'presence()', URL = '#presence', tooltip = 'presence bouts at loggers']
act[label = 'activity()', URL = '#acitivty', tooltip = 'Activity scored (active/inactive)']
daily[label = 'daily()', URL = '#dailyactivity', tooltip = 'Daily activity patterns']

visits-> {disp, move, presence}
presence -> act
act -> daily

map1[label = 'map_ggmap()', URL = '#mapggmap', tooltip = 'Visualize with a static map']
map2[label = 'map_leaflet()', URL = '#mapleaflet', tooltip = 'Visualize with an interactive map']

{presence; move} -> {map1; map2}

}
", width = 500)

visits()

This function is designed to turn 'raw' data into 'visits' data.

This is always the first function you should use. Raw read data contains an individual line of data for each read event made by the RFID logger.

For example, if the logger was set to record ids every ~2 seconds, an individual sitting on the logger for 8 seconds followed by a 12 second gap before returning would look something like this:

head(finches_lg)

The visits() function turns this raw data into something more practical:

v <- visits(finches_lg)
head(v)

We can see that the first series of raw reads have been combined into one visit, starting at 2015-10-01 17:38:52 and ending at 2015-10-01 17:39:00.

We can also look directly at visit data for a particular individual by pulling out it's animal_id:

head(finches_lg[finches_lg$animal_id == "06200004F8", ])
head(v[v$animal_id == "06200004F8", ])

Alternatively, you can use filter(v, animal_id == "06200004F8") from the dplyr package or subset(v, animal_id == "06200004F8") which is built into R base.

Visits are defined by:

Data returned

The data returned by the visit function contains both new and old data:

Note that several of these new values could also have been obtained by hand:

length(unique(v$animal_id))
length(unique(v$logger_id))

species, sex, lon and lat are all columns originally in the finches_lg data set, they were automatically passed through the visits() function. Columns of extra data that are associated with either a date, an animal_id, or a logger_id will always be passed through unless pass = FALSE:

visits(finches_lg, pass = FALSE)

bw

Depending on the interval between reads in your RFID logger, you may have to adjust the value of bw which specifies how many seconds between reads separates two visits. The default is 3 seconds, but if your loggers scan at a slower interval, you will need to adjust the bw argument:

visits(finches_lg, bw = 15)

Impossible/missing visits

You should also be aware of 'impossible visits'. These may occur if the internal clocks on your RFIDs are not in sync, or if your loggers are very close to each other. An impossible visit is when an individual animal is detected at two different loggers within 2 seconds (the default) of each other. Unless your loggers are REALLY close to each other, this is highly unlikely.

Because both missing values (NAs) and impossible visits likely indicate a greater underlying problem, you have to specifically tell the function to ignore them (or omit them, in the case of NAs).

v <- visits(finches_lg, allow_imp = TRUE, na_rm = TRUE)

You can define the minimum number of seconds needed to travel between loggers to specify which visits should be considered impossible. For example, if we say that it must take animals at least 3 minutes to travel between loggers, any visit made to a different logger within 3 minutes would be defined as 'impossible':

visits(finches_lg, bw_imp = 180)

Summarizing

Visit data is the starting block of all other transformations, but you can also summarize and analyze visit data in itself.

For example, if we want to create daily summaries of the number of visits made and loggers used per animal_id, we could use the dplyr package (part of the tidyverse family of packages).

Currently we have data on the total number of loggers in the data set, but what if we wanted to know how many loggers each individual used?

library(dplyr)
visit_summary <- v %>%
  group_by(animal_id) %>%
  summarize(n_visits = length(start),
            n_loggers = length(unique(logger_id)),
            n_days = length(unique(date)),
            mean_visit_length = mean(as.numeric(difftime(end, start, units = "sec"))))
visit_summary

A Side Note
This bit of code uses a slightly advanced technique called 'pipes': %>% These allow you to pass the output from one line directly down to the next line where it becomes the first (and hidden) argument. In this case, the first line specifies our visits data frame v, this then becomes the input for group_by(). The output of group_by() then becomes the input for summarize(). This makes your coding more efficient but may take a bit of getting used to.

You can find an in-depth look at dplyr, pipes and related packages here: http://r4ds.had.co.nz/transform.html

For reference, you can achieve the same effect without pipes:

library(dplyr)
visit_summary <- group_by(v, animal_id)
visit_summary <- summarize(visit_summary,
               n_visits = length(start),
               n_loggers = length(unique(logger_id)),
               n_days = length(unique(date)),
               mean_visit_length = mean(as.numeric(difftime(end, start, units = "sec"))))
visit_summary

The result is a data frame with one row for each animal_id, giving us information on how many loggers that animal has used, how many visits they have made, over how many days, and what the average visit length was.

If we wanted to plot this data, the package ggplot2 is a great option:

library(ggplot2)
ggplot(data = visit_summary, aes(x = animal_id, y = mean_visit_length)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + #rotate/shift id labels
  geom_bar(stat = "identity") + # Create bars
  labs(x = "Animal ID", y = "Average visit length (s)") # Make labels pretty

Back to top

move()

This function is designed to turn 'Visit' data into 'Movement' data.

Movements are defined as trips between loggers (i.e. a event when a visit from one animal_id occurs at a different logger from the previous visit by that particular animal_id).

v <- visits(finches_lg, bw = 5)
m <- move(v)
head(m)

Data returned

This function returns a lot of information for each movement event. Note, in particular, that each movement event results in two rows of data: leaving and arriving

animal_n, logger_n, species, sex, lon, and lat are extra columns passed through (see visits data for more information).

Include all

In this case, four animal_id have been omitted because they never moved between loggers. While the id is retained in the levels of the factor, animal_id, it is not present in the data frame:

unique(m$animal_id)         # The animal_ids currently in the movement data
length(unique(m$animal_id)) # The number of animal_ids currently in the data
length(levels(m$animal_id)) # The number of levels of animal_id

However, if you wanted to retain all animals in the data set, regardless of whether they moved or not (for example, if you wanted to look at the likelihood of moving between loggers), you can specify that:

m <- move(v, all = TRUE)

unique(m$animal_id)         # The animal_ids currently in the movement data
length(unique(m$animal_id)) # The number of animal_ids currently in the data
length(levels(m$animal_id)) # The number of levels of animal_id

Here we've told R that an additional parameter for move() is all = TRUE. animals which never visit more than one logger will have NA's in the data frame:

m[is.na(m$direction), ]

Summarizing

Once again we can summarize movement data using the dplyr package.

Are certain paths used more often?

move_summary <- m %>%
  group_by(move_path, logger_id) %>%
  summarize(n = length(move_path))
move_summary

The NAs reflect the four individuals that did not move.

Note that because there are two rows per movement event (leaving and arriving), we group by logger_id to summarize individually by logger (resulting in duplicate ns per move_path. This will account for the 2 rows and keep logger ids in the final data frame (required for mapping later on).

However, if you only wanted move_path and n, you could simply omit logger_id and remove duplicates:

move_summary %>% 
  select(-logger_id) %>%
  unique()
ggplot(data = move_summary, aes(x = move_path, y = n)) +
  geom_bar(stat = "identity")

Are certain directions on particular paths used more often? (i.e. do birds go back and forth or do they circle around?)

move_summary <- m %>%
  group_by(logger_id, move_path, move_dir) %>%
  summarize(n = length(move_path))
head(move_summary)
ggplot(data = move_summary, aes(x = move_path, y = n, fill = move_dir)) +
  geom_bar(stat = "identity", position = "dodge", colour = "black")

Is there a relationship between use of a path and the average 'strength'?

move_summary <- m %>%
  group_by(logger_id, move_path) %>%
  summarize(n = length(move_path),
            strength = mean(strength, na.rm = TRUE))
move_summary
ggplot(data = move_summary, aes(x = strength, y = n)) +
  geom_point() +
  stat_smooth(method = "lm") # add a regression line w/ standard error

Do some animals move more than others?

move_summary <- m %>%
  group_by(animal_id) %>%
  summarize(n = length(move_path[!is.na(move_path)])) # Number of movements omitting NAs
move_summary

Note that we have some zeros because we made sure to include all individuals in the analysis (all = TRUE, above) and because we specified summarize() to count only non-NA paths (move_path[!is.na(move_path)]).

ggplot(data = move_summary, aes(x = animal_id, y = n)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + #rotate and shift id labels
  geom_bar(stat = "identity") 

Back to top

presence()

This function is designed to turn 'Visit' data into 'Presence' data.

Presence reflects blocks of time in which an animal visited a given logger regularly. Essentially, presence is defined as a series of visits, and can overlap with presence of other animals (this differs from visits, as normally one individual's visits cannot overlap with another). Presence near a logger ends when the individual moves to another logger or when a particular amount of time has passed without a visit (the time between visits, bw, 15 min by default).

v <- visits(finches_lg, bw = 5)
p <- presence(v)
head(p)

Data returned

animal_n, logger_n, species, sex, lon, and lat are extra columns passed through (see visits data for more information).

bw

The user of the function may define what the minimum amount of time between visits should be. Ground truthing could help establish the amount of time an individual spends near a logger without making a logger visit and could be used to determine this cutoff time.

Note that the units for bw when defining visits is in seconds, while the units for bw when defining presence is in minutes. This reflects the different scales of the two functions.

v <- visits(finches_lg, bw = 5)
p <- presence(v, bw = 20)  # Within 20 min
head(p)
v <- visits(finches_lg, bw = 5)
p <- presence(v, bw = 90)  # Within 1.5 h
head(p)

The bw parameter can also be omitted entirely:

v <- visits(finches_lg, bw = 5)
p <- presence(v, bw = NULL)
head(p)

When bw = NULL only a move to a different logger will initiate a new presence bout. Therefore this identifies how much time passes before a new logger is visited. The end time reflects the time of the final visit to that logger before moving to a new logger, but if an individual were to visit logger A all day on day 1 and disappear on day 2 and then make one visit to logger A on day 3 before moving immediately to logger B, that first bout of 'presence' would be considered two days long, whereas in reality, the individual did not make any visits on one of those days.

Summarizing

What's the average time present for each individual?

v <- visits(finches_lg, bw = 5)
p <- presence(v)

presence_summary <- p %>%
  group_by(animal_id) %>%
  summarize(n = length(start),          ## how many rows = number of events
            mean_length = mean(length)) ## mean length of events

presence_summary

Is there a relationship between the number and length of presence bouts?

ggplot(data = presence_summary, aes(x = n, y = mean_length)) +
  geom_point()

Do animals spend more time around certain loggers?

presence_summary <- p %>%
  group_by(logger_id) %>%
  summarize(n = length(start),
            mean_length = mean(length),
            se_length = sd(length)/sqrt(n))

presence_summary

ggplot(data = presence_summary, aes(x = logger_id, y = mean_length, fill = logger_id)) +
  geom_bar(stat = "identity") +
  geom_errorbar(aes(ymin = mean_length - se_length, 
                    ymax = mean_length + se_length), width = 0.25)

Note that there is only one bout of presence around logger 2400, hence the lack of errorbars.


Back to top

disp()

This function is designed to turn 'visit' data into 'displacements' data.

Displacements are events when one animal leaves the logger and is replaced by another animal within a certain, defined, time span.

In theory, we assume this is due to the arrival of a more dominant animal. Displacements events might therefore lead to information on relative dominance of different individuals.

Disclaimer!
It is important to ground-truth this assumption because, depending on species, displacements may not necessarily represent dominance. For example, in the case of black-capped chickadees, if the dominant male is using the logger, subsequent users of the logger may actually arrive and depart in descending order of rank. In this case, apparent 'displacements' may not be displacements at all.

v <- visits(finches_lg, bw = 5)
d <- disp(v)
names(d)

Data returned

This function returns a list with three different types of displacement information: displacements, summaries, and interactions.

head(d$displacements)

The displacements item is a data frame containing information on each displacement event

animal_n, logger_n, species, sex, lon, and lat are extra columns passed through (see visits data for more information).

head(d$summaries)

The summaries item is a data frame summarizing the displacement events for each individual

head(d$interactions)

The interactions item is a data frame summarizing each potential interaction event:

No events

If there are no displacement events in your data set, the function will give you a message and will return an empty list. In this case, the data set is so small, there are no displacement events:

v <- visits(finches, bw = 5)
disp(v)

bw

The disp() function also contains a bw argument. In this case it reflects the maximum interval between the first animal leaving and the second animal arriving for an interaction to be considered a displacement:

v <- visits(finches_lg, bw = 5)
head(disp(v)$summaries)
head(disp(v, bw = 10)$summaries)
head(disp(v, bw = 15)$summaries)

However, an interval of 15s is pretty long and probably doesn't constitute a real displacement event.

Summarizing

Is one logger more contentious than others?

v <- visits(finches_lg, bw = 5)
d <- disp(v)

disp_summary <- d$displacements %>%
  group_by(logger_id, role) %>%
  count() %>%
  filter(role == "displacer")

ggplot(data = disp_summary, aes(x = logger_id, y = n)) +
  geom_bar(stat = "identity")

What if we control for overall logger use?

disp_summary <- d$displacements %>%
  group_by(logger_id, role) %>%
  count() %>%
  filter(role == "displacer")

presence_summary <- p %>%
  group_by(logger_id) %>%
  summarize(amount = sum(as.numeric(length)))

full_summary <- full_join(disp_summary, presence_summary, by = "logger_id") %>%
  mutate(n = replace(n, is.na(n), 0),  # First replace the missing values of n with 0
         n_per_hour = n / amount * 60) # Calculate the number of interactions as an hourly rate

full_summary

ggplot(data = full_summary, aes(x = logger_id, y = n)) +
  geom_bar(stat = "identity")
d$summaries

ggplot(data = d$summaries, aes(x = animal_id, y = p_win)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + #rotate and shift id labels
  geom_bar(stat = "identity") +
  ylim(c(0, 1)) +
  labs(x = "Animal Id", y = "Proportion of 'wins'")

Back to top

activity()

This function scores individuals as either active or inactive based on their presence data. While simply looking at the presence bouts is a good proxy for active/inactive, scoring intervals as active/inactive allows for easier plotting as well as for averaging over days to create daily activity patterns (see daily() function).

v <- visits(finches_lg, bw = 5)
p <- presence(v)
a <- activity(p, res = 5)
head(a)

Data returned

Several messages along with a data set containing:

animal_n, logger_n, species, sex are extra columns passed through (see visits data for more information).

The messages reflect information regarding the quality of the activity data:

res

The default function calculates activity for each animal at a resolution of 15min. Meaning that every 15min an individual is scored as active or inactive. However, if individuals make short and infrequency visits to a logger, periods of activity may be missed if the active bouts are smaller than the resolution. Resolution can be adjusted through the res argument (note the change in time, and the change in messages).

head(activity(p, res = 1))

by_logger

By default, the activity() function sums all activity across loggers. However, if you wish to calculate separate activity patterns for different loggers, you can set by_logger = TRUE. Note the logger_id and that there are now 4 observations per time block for each individual:

head(activity(p, by_logger = TRUE))

sun

By default sunrise and sunset times are calculated if the data contains latitude and longitude information. You can turn this off with sun = FALSE.

keep_all

By default, the activity() function skips individuals with less than 24hrs of data, you can force it to retain all individuals by specifying keep_all = TRUE.

missing

Currently this functionality is not implemented. However, eventually we plan to allow users to specify times of known missing data, so that activity can be specified as 'active', 'inactive', or 'unknown'.

Summarizing

Activity patterns can be plotted in ggplot2. Let's look at a specific time period:

library(dplyr)
library(ggplot2)
library(lubridate)

# Filter by date and animal ID, create a "day" column
sept <- a %>%
  filter(date >= as_date("2015-09-21"),
         date < as_date("2015-09-28"),
         animal_id == "06200004F8")

# Use facet_wrap to look at individual days, use drop = TRUE to omit days without activity
ggplot(data = sept, aes(x = time, y = activity)) +
  facet_wrap(~ date, scales = "free_x", ncol = 1, drop = TRUE) +
  geom_area()

There's a clear daily pattern, but it's a bit hard to tell without sunrise and sunset:

sun <- unique(select(sept, rise, set, date))

ggplot(data = sept, aes(x = time, y = activity)) +
  facet_wrap(~ date, scales = "free_x", ncol = 1, drop = TRUE) +
  geom_rect(mapping = aes(xmax = rise), xmin = -Inf, ymin = -Inf, ymax = +Inf, fill = "grey") +
  geom_rect(mapping = aes(xmin = set), xmax = +Inf, ymin = -Inf, ymax = +Inf, fill = "grey") +
  geom_area()

Back to top

daily()

daily() converts activity patterns created by the activity() function to averaged daily activity patterns.

v <- visits(finches_lg, bw = 5)
p <- presence(v)
a <- activity(p, res = 5)
da <- daily(a)

Data returned

A data set containing:

animal_n, logger_n, species, sex are extra columns passed through (see visits data for more information).

If missing data are indicated in activity(), then the proportion of unknown blocks will be greater than 0 (Note that this is not currently available).

Summarizing

Daily activity patterns can be plotted by hiding the date portion of the time stamp.

ggplot(data = da, aes(x = time, y = p_active)) +
  geom_rect(aes(xmax = rise), xmin = -Inf, ymin = -Inf, ymax = +Inf, fill = "grey") +
  geom_rect(aes(xmin = set), xmax = +Inf, ymin = -Inf, ymax = +Inf, fill = "grey") +
  geom_area() +
  facet_wrap(~ animal_id) +
  scale_x_datetime(date_labels = "%H:%M:%S", date_breaks = "6 hours")

Note also that this script repeats the drawing of the sunrise/sunset rectangles, making it difficult to use alpha values. To avoid this, create a reduced data frame with only sunrise/sunset data:

sun <- unique(da[, c("rise", "set", "animal_id")])

ggplot(data = da) +
  geom_rect(data = sun, aes(xmax = rise), xmin = -Inf, ymin = -Inf, ymax = +Inf, fill = "black", alpha = 0.1) +
  geom_rect(data = sun, aes(xmin = set), xmax = +Inf, ymin = -Inf, ymax = +Inf, fill = "black", alpha = 0.1) +
  geom_area(aes(x = time, y = p_active)) +
  facet_wrap(~ animal_id) +
  scale_x_datetime(date_labels = "%H:%M:%S", date_breaks = "6 hours")

Conversion functions

Conversion functions permit you to format raw or transformed data for use in other packages.

convert_anidom()

Converts the output of disp() for use by the package aniDom's functions: elo_scores() and estimate_uncertainty_by_splitting()

d <- disp(visits(finches_lg), bw = 5)
i <- convert_anidom(d)
head(i)
library(aniDom)

# Calculate elo_scores
s <- elo_scores(winners = i$winner, losers = i$loser)
s

# Estimate repeatability
estimate_uncertainty_by_repeatability(winners = i$winner, losers = i$loser)
estimate_uncertainty_by_splitting(winners = i$winner, losers = i$loser, 
                                  randomise = TRUE)

convert_dominance()

Converts the output of disp() for use by the package Dominance's functions: ADI() and Sociogram()

d <- disp(visits(finches_lg), bw = 5)
i <- convert_dominance(d)

library(Dominance)

# Calculate the Average Dominance Index
ADI(data_sheet = i$data_sheet, items = i$items, actions = i$actions, bytes = i$bytes)
# Construct social network graphs
Sociogram(data_sheet = i$data_sheet, items = i$items, actions = i$actions, bits = i$bytes)

convert_perc()

Converts the output of disp() for use by the package Perc

d <- disp(visits(finches_lg), bw = 5)
i <- convert_perc(d)
head(i)
library(Perc)

# Calculate ranks (adapted from Perc examples)
conflict_mat <- as.conflictmat(i, weighted = TRUE)
perm <- conductance(conflict_mat, 2)
simRankOrder(perm$p.hat, num = 10, kmax = 1000)

convert_asnipe()

Converts raw RFID data into a format for easy use by either the gmmevents() or the get_associations_points_tw() functions included in the asnipe package for calculating group membership.

For gmmevents():

a <- convert_asnipe(finches, fun = "gmmevents")
head(a)
library(asnipe)
gmmevents(a$time, a$identity, a$location)

For get_associations_points_bw():

a <- convert_asnipe(finches, fun = "get_associations_points_tw")
head(a)
get_associations_points_tw(a)

convert_activity()

Converts raw RFID data into a format for easy use by the fitact() function included in the activity package for modelling activity levels and daily activity patterns.

i <- convert_activity(finches_lg)
library(activity)

# Calculate daily activity pattern for a single individual
a <- fitact(i[[1]], sample = "none")
plot(a)

# Calculate daily activity pattern for all individuals
a <- lapply(i, fitact, sample = "none")
plot(a[[3]])
plot(a[["06200004F8"]])

Back to top

Multiple groups

Sometimes you may want to perform the same data transformations multiple times, perhaps for different sites, time periods, or experiments. You could repeat yourself, but the easiest thing to do is to use the do() function from the dplyr package or the map() frunction from the purrr package.

These function allows you to apply functions to groups in your data frame.

In this example we have two experiments, so we want to calculate the transformations separately for each experiment:

library(dplyr)

v_exp <- chickadees %>%
  group_by(experiment) %>%
  do(visits(.))  # Use "." to define where the piped data should go 

# Note that the data set retains it's grouping so we do not have to 
# specify the grouping again until we change it:
p_exp <- v_exp %>%
  do(presence(.)) 

m_exp <- v_exp %>%
  do(move(.))

# Summarizing
p_avg <- p_exp %>%
  group_by(experiment, logger_id) %>%
  summarize(amount = sum(length)/animal_n[1])  #This is using the animal_n specific to the experiment

m_avg <- m_exp %>%
  group_by(experiment, move_path, logger_id, lat, lon) %>%
  summarize(path_use = length(move_path))

summary(p_avg)
summary(m_avg)

These data are now ready for analysis and visualizations, note that they can be applied as is to visualizing functions in the next section.

In future versions of dplyr it is likely the do() function will be deprecated, so let's consider the map() function from the purrr package. This is also an interesting method in that all data sets can be kept together in a single object (note that each line in mutate runs sequentially, so we can create the column visits and then use it in the next line).

library(dplyr)
library(purrr)

exp <- chickadees %>%
  group_by(experiment) %>%
  nest(.key = "raw") %>%
  mutate(visits = map(raw, ~ visits(.x)),
         presence = map(visits, ~ presence(.x)),
         move = map(presence, ~ move(.x)))

exp

Summarizing

p_avg <- exp %>%
  unnest(presence) %>%
  group_by(experiment, logger_id) %>%
  summarize(amount = sum(length)/animal_n[1])  
  # This is using the animal_n specific to the experiment

m_avg <- exp %>%
  unnest(move) %>%
  group_by(experiment, move_path, logger_id, lat, lon) %>%
  summarize(path_use = length(move_path))

summary(p_avg)
summary(m_avg)

Back to top
Go back to home | Go back to housekeeping | Continue with visualizations



animalnexus/feedr documentation built on Feb. 2, 2023, 1:12 a.m.