# knitr options knitr::opts_chunk$set( collapse = TRUE, warning = FALSE, message = FALSE, echo = TRUE, comment = "#>" ) library(knitr) library(kableExtra) library(here)
For more detailed explanations of the following topics, we encourage users to read through these vignettes:
This vignette shows how to use the PITcleanr
package to wrangle PIT tag data to either summarize detections or prepare the data for further analysis. PITcleanr
can help import complete tag histories from PTAGIS, build a configuration file to help assign each detection to a "node", and compress those detections into a smaller file. It contains functions to determine which detection locations are upstream or downstream of each other, build a parent-child relationship table of locations, and assign directionality of movement between each detection site. For analyses that focus on one-way directional movement (e.g., straightforward CJS models), PITcleanr
can help determine which detections fail to meet that one-way movement assumption and should be examined more closely, and which detections can be kept.
Once PITcleanr
is successfully installed, it can be loaded into the R session. In this vignette, we will also use many functions from the tidyverse
group of packages, so load those as well:
library(PITcleanr) library(dplyr) library(readr) library(stringr) library(purrr) library(tidyr) library(ggplot2) library(lubridate)
Note that many of the various queries in PITcleanr
require a connection to the internet.
PITcleanr
can help perform some basic quality control on the PTAGIS detections. The qcTagHistory()
function returns a list including a vector of tags listed as "DISOWN" by PTAGIS, another vector of tags listed as "ORPHAN", and a data frame of release batches.
An "ORPHAN" tag indicates that PTAGIS has received a detection either at an interrogation site or through a recapture or recovery, but marking information has not yet been submitted for that tag. Once marking data is submitted, that tag will no longer appear as an "ORPHAN" tag.
"DISOWN" records occur when a previously loaded mark record is changed to a recapture or recovery event. Now that tag no longer has a valid mark record. Once new marking data is submitted, that tag will no longer appear as a "DISOWN"ed tag. For more information, see the PTAGIS FAQ website.
The release batches data frame can be used by the compress()
function in PITcleanr
to determine whether to use the event time or the event release time as reported by PTAGIS for a particular site/year/species combination. However, to extract that information, the user will have needed to include the attributes "Mark Species", "Event Release Site Code" and "Event Release Date Time" in their PTAGIS query.
Here, we'll run the qcTagHistory()
function, again on the file path (ptagis_file
) of the PTAGIS complete tag history query output, and save the results to an object qc_detections
.
qc_detections = qcTagHistory(ptagis_file) # view qc_detections qc_detections
To extract a single element from the qc_detections
, the user can simply use the basic extraction operator $
e.g., qc_detections$orphan_tags
.
In this vignette, we will use an example dataset of Spring Chinook Salmon tagged as adults at Tumwater dam in 2018. PITcleanr
includes the complete tag history downloaded from PTAGIS, the configuration file and the parent-child table for this particular analysis. However, if the user needs to consult how to query or construct any of those, please read the appropriate vignette.
# the complete tag history file from PTAGIS ptagis_file <- system.file("extdata", "TUM_chnk_cth_2018.csv", package = "PITcleanr", mustWork = TRUE) # read the complete tag history file into R, and filter detections before a certain date cth_df <- readCTH(ptagis_file, file_type = "PTAGIS") |> filter(event_date_time_value >= lubridate::ymd(20180301)) # the pre-built configuration file configuration <- system.file("extdata", "TUM_configuration.csv", package = "PITcleanr", mustWork = TRUE) |> read_csv(show_col_types = F) # the pre-built parent-child table parent_child <- system.file("extdata", "TUM_parent_child.csv", package = "PITcleanr", mustWork = TRUE) |> read_csv(show_col_types = F)
The first step is to compress the raw detection data. This maps each detection on an individual antenna to a node, defined in the configuration
file, and combines detections on the same node to a single row.
comp_obs <- compress(cth_df, configuration = configuration)
In our Tumwater Dam Spring Chinook Salmon example, the compress()
function has scaled down the raw data from PTAGIS from r prettyNum(nrow(cth_df), big.mark = ",")
rows of data to a slightly more manageable r prettyNum(nrow(comp_obs), big.mark = ",")
rows. At the same time, the sum of the n_dets
column in comp_obs
is r prettyNum(sum(comp_obs$n_dets), big.mark = ",")
, showing that every detection from the raw data has been accounted for. For much of the remaining vignette, we will explore other functionality of PITcleanr
and as it relates to this compressed data i.e., the comp_obs
object.
A parent-child table describes the relationship between detection sites, showing possible movement paths of the tagged fish. Tags must move from a parent site to a child site (although they may not necessarily be detected at either). For much more information on constructing parent-child tables, please see the parent-child vignette.
Graphically, these relationships can be displayed using the plotNodes()
function.
plotNodes(parent_child)
The initial parent-child table only contains entire sites. However, in our configuration file we have defined the nodes to be distinct arrays (e.g. rows of antennas), and most sites in this example contain multiple arrays (and therefore two nodes). We can expand the parent-child table by using the configuration file, and the addParentChildNodes()
function. The graph of nodes expands.
Note that in PITcleanr
's default settings, nodes that end in _D
are generally the downstream array, while nodes that end in _U
are the upstream array. For more information about that, see the configuration files vignette.
parent_child_nodes <- addParentChildNodes(parent_child = parent_child, configuration = configuration) plotNodes(parent_child_nodes)
PITcleanr
provides a function to determine the detection locations a tag would pass between a starting point (e.g, a tagging or release location) and an ending point (e.g., each subsequent detection point), based on the parent-child table. We refer to these detection locations as a movement path. They are equivalent to following one of the paths in the figures above (produced by plotNodes()
).
The function buildPaths()
will provide the movement path leading to each node in a parent-child table. It provides a row for each node, and then provides the path that an individual would have to take to get to that given node. The build_paths()
function could be performed on either the parent_child
or parent_child_nodes
objects.
buildPaths(parent_child) # to set direction to downstream, for example # buildPaths(parent_child, # direction = "d")
The buildNodeOrder()
provides both the movement path and the node order of each node (starting from the root site and counting upwards past each node). Each of these functions currently has an argument, direction
that provides paths and node orders based on upstream movement (the default) or downstream movement. For downstream movement, each node may appear in the resulting tibble multiple times, as there may be multiple ways to reach that node, from different starting points. Again, buildNodeOrder()
could be performed on either parent_child
or parent_child_nodes
.
buildNodeOrder(parent_child)
With the parent-child relationships established, PITcleanr
can assign a direction between each node where a given tag was detected.
comp_dir <- addDirection(compress_obs = comp_obs, parent_child = parent_child_nodes, direction = "u")
The comp_dir
data.frame contains all the same columns as the comp_obs
, with an additional column called direction
. However, comp_dir
may have fewer rows, because any node in the comp_obs
data.frame that is not contained in the parent-child table will be dropped. The various direction
values indicate the following:
start
: The initial detection at the "root site". Every tag should have this row.forward
: Based on the current and previous detections, the tag has moved forward along a parent-child path (moving from parent to child). backward
: Based on the current and previous detections, the tag has moved backwards along a parent-child path (moving from child to parent).no movement
: indicates the previous and current detections were from the same node. Often this occurs when the event_type_name
differs (e.g. an "Observation" on an interrogation antenna and a "Recapture" from a weir that has been assigned the same node as the antenna)unknown
: this indicates the tag has been detected on a different path of the parent-child graph. For instance, a fish moves into one tributary and is detected there, but then falls back and moves up into a different tributary and is detected there. In our example, since we are focused on upstream-migrating fish, forward
indicates upstream movement and backward
indicates downstream movement. Fish with an unknown
direction somewhere in their detections have not taken a straightforward path along the parent-child graph.
For some types of analyses, such as Cormack-Jolly-Seber (CJS) models, one of the assumptions is that a tag/individual undergoes only one-way travel (i.e., travel is either all upstream or all downstream). To meet this assumption, individual detections sometime need to be discarded. For example, an adult salmon undergoing an upstream spawning migration may move up a mainstem migration corridor (e.g., the Snake River), dip into a tributary (e.g., Selway River in the Clearwater), and then move back downstream and up another tributary (e.g., Salmon River) to their spawning location. In this case, any detections that occurred in the Clearwater River would need to be discarded if we believe the fish was destined for a location in the Salmon River. In other cases, more straightforward summaries of directional movements may be desired.
PITcleanr
contains a function, filterDetections()
, to help determine which tags/individuals fail to meet the one-way travel assumption and need to be examined further. filterDetections()
first runs addDirection()
, and then adds two columns, auto_keep_obs
and user_keep_obs
. These are meant to indicate whether each row should be kept (i.e. marked TRUE
) or deleted (i.e. marked FALSE
). For tags that do meet the one-way travel assumption, both auto_keep_obs
and user_keep_obs
columns will be filled. If a fish moves back and forth along the same path, PITcleanr
will indicate that only the last detection at each node should be kept.
comp_filter <- filterDetections(compress_obs = comp_obs, parent_child = parent_child_nodes, max_obs_date = "20180930")
If a tag fails to to meet that assumption (e.g. at least one direction is unknown
), the user_keep_obs
column will be NA
for all observations of that tag. The auto_keep_obs
column is PITcleanr
's best guess as to which detections should be kept. This guess is based on the assumption that the last and furthest forward-moving detection should be kept, and all detections along that movement path should be kept, while dropping others. Below is an example of tag that fails to meet the one-way travel assumption, and which observations PITcleanr
suggests should be kept (keep the detection on CHL_D and drop the one on WTL_U).
comp_filter |> select(tag_code:min_det, direction, ends_with("keep_obs")) |> filter(is.na(user_keep_obs)) |> filter(tag_code == tag_code[1]) |> kable() |> kable_styling()
The user can then save the output from filterDetections()
, and fill in all the blanks in the user_keep_obs
column. The auto_keep_obs
column is merely a suggestion from PITcleanr
, the user should use their own knowledge of the landscape, where the detection sites are, the run timing and general spatial patterns of that population to make final decisions. Once all the blank user_keep_obs
cells have been filled in, that output can be read back into R and filtered for when user_keep_obs == TRUE
, and the analysis can proceed from there.
It should be noted that not every analysis requires a one-way travel assumption, so this step may not be necessary depending on the user's goals.
Often, various mark-recapture models require a capture history of 1's and 0's as inputs. The compressed observations (whether they've been through filterDections()
or not, depending on the user's analysis) can be converted into that kind of capture history in a fairly straightforward manner. One key will be to ensure the nodes or sites are put in correct order for the user.
The function defineCapHistCols()
determines the order of nodes to be used in a capture history matrix. At a minimum, it utilizes the parent-child table and the configuration file. If river kilometer is provided as part of the configuration file, in a column called rkm
, the user may set use_rkm = T
and it will organize nodes by rkm mask. Otherwise, it defaults to sorting by the paths defined in buildNodeOrder()
.
If the user would like to keep individual columns for each node, they can set drop_nodes = F
. The default (drop_nodes = T
) will delete all the columns associated with each node, and consolidate all them into one column called cap_hist
that is a string of ones and zeros indicating the capture history. Many mark-recapture models, such as those used in program MARK, use those types of capture histories as inputs. The user can then also attach any relevant covariates (e.g. marking location, size at tagging, etc.) based on the tag code.
# translate PIT tag observations into capture histories, one per tag buildCapHist(comp_filter, parent_child = parent_child, configuration = configuration, use_rkm = T)
To determine which position in the cap_hist
string corresponds to which node, the user can run the defineCapHistCols()
function (which is called internally within buildCapHist()
).
# to find out the node associated with each column col_nodes <- defineCapHistCols(parent_child = parent_child, configuration = configuration, use_rkm = T) col_nodes
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.