# knitr options
knitr::opts_chunk$set(
  collapse = TRUE,
  warning = FALSE,
  message = FALSE,
  echo = TRUE,
  comment = "#>"
)

library(knitr)
library(kableExtra)
library(here)

Introduction

For more detailed explanations of the following topics, we encourage users to read through these vignettes:

This vignette shows how to use the PITcleanr package to wrangle PIT tag data to either summarize detections or prepare the data for further analysis. PITcleanr can help import complete tag histories from PTAGIS, build a configuration file to help assign each detection to a "node", and compress those detections into a smaller file. It contains functions to determine which detection locations are upstream or downstream of each other, build a parent-child relationship table of locations, and assign directionality of movement between each detection site. For analyses that focus on one-way directional movement (e.g., straightforward CJS models), PITcleanr can help determine which detections fail to meet that one-way movement assumption and should be examined more closely, and which detections can be kept.

Installation


Once PITcleanr is successfully installed, it can be loaded into the R session. In this vignette, we will also use many functions from the tidyverse group of packages, so load those as well:

library(PITcleanr)
library(dplyr)
library(readr)
library(stringr)
library(purrr)
library(tidyr)
library(ggplot2)
library(lubridate)

Note that many of the various queries in PITcleanr require a connection to the internet.

Querying Detection Data

From PTAGIS


Quality Control

PITcleanr can help perform some basic quality control on the PTAGIS detections. The qcTagHistory() function returns a list including a vector of tags listed as "DISOWN" by PTAGIS, another vector of tags listed as "ORPHAN", and a data frame of release batches.

An "ORPHAN" tag indicates that PTAGIS has received a detection either at an interrogation site or through a recapture or recovery, but marking information has not yet been submitted for that tag. Once marking data is submitted, that tag will no longer appear as an "ORPHAN" tag.

"DISOWN" records occur when a previously loaded mark record is changed to a recapture or recovery event. Now that tag no longer has a valid mark record. Once new marking data is submitted, that tag will no longer appear as a "DISOWN"ed tag. For more information, see the PTAGIS FAQ website.

The release batches data frame can be used by the compress() function in PITcleanr to determine whether to use the event time or the event release time as reported by PTAGIS for a particular site/year/species combination. However, to extract that information, the user will have needed to include the attributes "Mark Species", "Event Release Site Code" and "Event Release Date Time" in their PTAGIS query.

Here, we'll run the qcTagHistory() function, again on the file path (ptagis_file) of the PTAGIS complete tag history query output, and save the results to an object qc_detections.

qc_detections = qcTagHistory(ptagis_file)

# view qc_detections
qc_detections

To extract a single element from the qc_detections, the user can simply use the basic extraction operator $ e.g., qc_detections$orphan_tags.

Non-PTAGIS data


Compressing Detection Data

In this vignette, we will use an example dataset of Spring Chinook Salmon tagged as adults at Tumwater dam in 2018. PITcleanr includes the complete tag history downloaded from PTAGIS, the configuration file and the parent-child table for this particular analysis. However, if the user needs to consult how to query or construct any of those, please read the appropriate vignette.

# the complete tag history file from PTAGIS
ptagis_file <- system.file("extdata", 
                           "TUM_chnk_cth_2018.csv", 
                           package = "PITcleanr",
                           mustWork = TRUE)

# read the complete tag history file into R, and filter detections before a certain date
cth_df <- readCTH(ptagis_file,
                  file_type = "PTAGIS") |> 
  filter(event_date_time_value >= lubridate::ymd(20180301))

# the pre-built configuration file
configuration <- system.file("extdata", 
                             "TUM_configuration.csv", 
                             package = "PITcleanr",
                             mustWork = TRUE) |> 
  read_csv(show_col_types = F)

# the pre-built parent-child table
parent_child <- system.file("extdata", 
                            "TUM_parent_child.csv", 
                            package = "PITcleanr",
                            mustWork = TRUE) |> 
  read_csv(show_col_types = F)

The first step is to compress the raw detection data. This maps each detection on an individual antenna to a node, defined in the configuration file, and combines detections on the same node to a single row.

comp_obs <- compress(cth_df,
                     configuration = configuration)

In our Tumwater Dam Spring Chinook Salmon example, the compress() function has scaled down the raw data from PTAGIS from r prettyNum(nrow(cth_df), big.mark = ",") rows of data to a slightly more manageable r prettyNum(nrow(comp_obs), big.mark = ",") rows. At the same time, the sum of the n_dets column in comp_obs is r prettyNum(sum(comp_obs$n_dets), big.mark = ","), showing that every detection from the raw data has been accounted for. For much of the remaining vignette, we will explore other functionality of PITcleanr and as it relates to this compressed data i.e., the comp_obs object.

Parent-Child Table

A parent-child table describes the relationship between detection sites, showing possible movement paths of the tagged fish. Tags must move from a parent site to a child site (although they may not necessarily be detected at either). For much more information on constructing parent-child tables, please see the parent-child vignette.

Graphically, these relationships can be displayed using the plotNodes() function.

plotNodes(parent_child)

The initial parent-child table only contains entire sites. However, in our configuration file we have defined the nodes to be distinct arrays (e.g. rows of antennas), and most sites in this example contain multiple arrays (and therefore two nodes). We can expand the parent-child table by using the configuration file, and the addParentChildNodes() function. The graph of nodes expands.

Note that in PITcleanr's default settings, nodes that end in _D are generally the downstream array, while nodes that end in _U are the upstream array. For more information about that, see the configuration files vignette.

parent_child_nodes <- addParentChildNodes(parent_child = parent_child,
                                          configuration = configuration)

plotNodes(parent_child_nodes)

Movement Paths & Node Order

PITcleanr provides a function to determine the detection locations a tag would pass between a starting point (e.g, a tagging or release location) and an ending point (e.g., each subsequent detection point), based on the parent-child table. We refer to these detection locations as a movement path. They are equivalent to following one of the paths in the figures above (produced by plotNodes()).

The function buildPaths() will provide the movement path leading to each node in a parent-child table. It provides a row for each node, and then provides the path that an individual would have to take to get to that given node. The build_paths() function could be performed on either the parent_child or parent_child_nodes objects.

buildPaths(parent_child)

# to set direction to downstream, for example
# buildPaths(parent_child,
#            direction = "d")

The buildNodeOrder() provides both the movement path and the node order of each node (starting from the root site and counting upwards past each node). Each of these functions currently has an argument, direction that provides paths and node orders based on upstream movement (the default) or downstream movement. For downstream movement, each node may appear in the resulting tibble multiple times, as there may be multiple ways to reach that node, from different starting points. Again, buildNodeOrder() could be performed on either parent_child or parent_child_nodes.

buildNodeOrder(parent_child)

Add Direction

With the parent-child relationships established, PITcleanr can assign a direction between each node where a given tag was detected.

comp_dir <- addDirection(compress_obs = comp_obs,
                         parent_child = parent_child_nodes,
                         direction = "u")

The comp_dir data.frame contains all the same columns as the comp_obs, with an additional column called direction. However, comp_dir may have fewer rows, because any node in the comp_obs data.frame that is not contained in the parent-child table will be dropped. The various direction values indicate the following:

In our example, since we are focused on upstream-migrating fish, forward indicates upstream movement and backward indicates downstream movement. Fish with an unknown direction somewhere in their detections have not taken a straightforward path along the parent-child graph.

Filtering Detections

For some types of analyses, such as Cormack-Jolly-Seber (CJS) models, one of the assumptions is that a tag/individual undergoes only one-way travel (i.e., travel is either all upstream or all downstream). To meet this assumption, individual detections sometime need to be discarded. For example, an adult salmon undergoing an upstream spawning migration may move up a mainstem migration corridor (e.g., the Snake River), dip into a tributary (e.g., Selway River in the Clearwater), and then move back downstream and up another tributary (e.g., Salmon River) to their spawning location. In this case, any detections that occurred in the Clearwater River would need to be discarded if we believe the fish was destined for a location in the Salmon River. In other cases, more straightforward summaries of directional movements may be desired.

PITcleanr contains a function, filterDetections(), to help determine which tags/individuals fail to meet the one-way travel assumption and need to be examined further. filterDetections() first runs addDirection(), and then adds two columns, auto_keep_obs and user_keep_obs. These are meant to indicate whether each row should be kept (i.e. marked TRUE) or deleted (i.e. marked FALSE). For tags that do meet the one-way travel assumption, both auto_keep_obs and user_keep_obs columns will be filled. If a fish moves back and forth along the same path, PITcleanr will indicate that only the last detection at each node should be kept.

comp_filter <- filterDetections(compress_obs = comp_obs,
                                parent_child = parent_child_nodes,
                                max_obs_date = "20180930")

If a tag fails to to meet that assumption (e.g. at least one direction is unknown), the user_keep_obs column will be NA for all observations of that tag. The auto_keep_obs column is PITcleanr's best guess as to which detections should be kept. This guess is based on the assumption that the last and furthest forward-moving detection should be kept, and all detections along that movement path should be kept, while dropping others. Below is an example of tag that fails to meet the one-way travel assumption, and which observations PITcleanr suggests should be kept (keep the detection on CHL_D and drop the one on WTL_U).

comp_filter |>
  select(tag_code:min_det, direction,
         ends_with("keep_obs")) |> 
  filter(is.na(user_keep_obs)) |> 
  filter(tag_code == tag_code[1]) |> 
  kable() |> 
  kable_styling()

The user can then save the output from filterDetections(), and fill in all the blanks in the user_keep_obs column. The auto_keep_obs column is merely a suggestion from PITcleanr, the user should use their own knowledge of the landscape, where the detection sites are, the run timing and general spatial patterns of that population to make final decisions. Once all the blank user_keep_obs cells have been filled in, that output can be read back into R and filtered for when user_keep_obs == TRUE, and the analysis can proceed from there.

It should be noted that not every analysis requires a one-way travel assumption, so this step may not be necessary depending on the user's goals.

Construct Capture Histories

Often, various mark-recapture models require a capture history of 1's and 0's as inputs. The compressed observations (whether they've been through filterDections() or not, depending on the user's analysis) can be converted into that kind of capture history in a fairly straightforward manner. One key will be to ensure the nodes or sites are put in correct order for the user.

The function defineCapHistCols() determines the order of nodes to be used in a capture history matrix. At a minimum, it utilizes the parent-child table and the configuration file. If river kilometer is provided as part of the configuration file, in a column called rkm, the user may set use_rkm = T and it will organize nodes by rkm mask. Otherwise, it defaults to sorting by the paths defined in buildNodeOrder().

If the user would like to keep individual columns for each node, they can set drop_nodes = F. The default (drop_nodes = T) will delete all the columns associated with each node, and consolidate all them into one column called cap_hist that is a string of ones and zeros indicating the capture history. Many mark-recapture models, such as those used in program MARK, use those types of capture histories as inputs. The user can then also attach any relevant covariates (e.g. marking location, size at tagging, etc.) based on the tag code.

# translate PIT tag observations into capture histories, one per tag
buildCapHist(comp_filter,
             parent_child = parent_child,
             configuration = configuration,
             use_rkm = T)

To determine which position in the cap_hist string corresponds to which node, the user can run the defineCapHistCols() function (which is called internally within buildCapHist()).

# to find out the node associated with each column
col_nodes <- defineCapHistCols(parent_child = parent_child,
                               configuration = configuration,
                               use_rkm = T)
col_nodes


KevinSee/PITcleanr documentation built on Feb. 27, 2024, 11:03 p.m.