# Bias caused by incomplete sampling In BTSPAS: Bayesian Time-Stratified Population Analysis

set.seed(856765)
max.width=200

library(ggplot2)
library(plyr)
library(stats)
# Generate population curve

logit <- function(p){log(p/(1-p))}
expit <- function(theta){1/(1+exp(-theta))}


# Location of vignette source and code.

Because of the length of time needed to run the vignettes, only static vignettes have been included with this package.

The original of the vignettes and the code can be obtained from the GitHub site at https://github.com/cschwarz-stat-sfu-ca/BTSPAS

# Introduction

This document will illustrate the potential biases caused by incomplete sampling in the recovery strata. For example, suppose that stratification is at a weekly level. Fish are tagged and released continuously during the week. Recoveries occur from a commercial fishery that only operating for 1/2 a week (the first half). This may cause bias in estimates of abundance because, for example, fish tagged at the end of a week, may arrive at the commercial fishery in the second half of the recovery week and not be subject to capture. This causes heterogeneity in recovery probabilities that is not accounted for in the mark-recapture analysis.

A simulated population will be created and then analyzed in several ways to illustrate the potential extent of bias, and how to properly stratify the data to account for this problem.

This scenario was originally envisioned to be handled with the sampfrac argument of the BTSPAS routines. However, the actual implementation is incorrect in BTSPAS and is deprecated. This vignette shows the proper way to deal with this problem.

## Experimental setup

This simulated population is modelled around a capture-capture experiment on the Taku River which flows between the US and Canada.

Returning salmon arrive and are captured at a fish wheel during several weeks. Those fish captured at the fish wheel are tagged and released (daily). They migrate upstream to a commercial fishery. The commercial fishery does not operate on all days of the week - in particular, the fishery tends to operate during the first part of the week until the quota for catch is reached. Then the fishery stops until the next week.

## Generation of population

# simulate the population
N <- 150000  # total run size

# Assume a 16 week spread, implying a peak at 8 weeks.
mean.at.wheel <- 42
sd.at.wheel   <- 15

pop <- data.frame(time.at.wheel=pmin(170,pmax(1,stats::rnorm(N, mean=mean.at.wheel, sd=sd.at.wheel)))) # date of arrival
pop$date <- trunc(pop$time.at.wheel)  # how many arrive at the wheel

pop.dist <- ggplot(data=pop, aes(x=time.at.wheel))+
ggtitle("Distribution of arrival time at tagging wheel")+
geom_histogram(breaks=0:200, alpha=0.2)+
xlab("Arrive date at tagging wheel")


A population of r formatC(round(N,0), digits=0, big.mark=',', format="f") fish will be simulated arriving at the fish wheels according to a normal distribution with a mean of r mean.at.wheel and a standard deviation of r sd.at.wheel. This gives a distribution of arrival times at the fish wheel of

pop.dist


The spikes at the start and end are where the arrival time has been truncated and fish forced to arrive in the first and last days of the run (for convenience).

If the fish wheels had a constant probability of capture, then the pooled Petersen would be unbiased regardless of what happens in the commercial fishery. Consequently, we simulate the probability of capture that varies around 0.05. The distribution of capture probabilities at the wheel is:

# capture prob at wheel is normal on logit( .03 with a sd of .01) but a maximum of 400 fish per week
capture.prob <- plyr::ddply(pop, "date", plyr::summarize,
tot.fish.wheel = length(date))
capture.prob$logit.tag <- stats::rnorm(nrow(capture.prob), mean=pmin(logit(.06), logit(1000/7/capture.prob$tot.fish.wheel), na.rm=TRUE), sd=0.5)
#capture.prob$logit.tag <- stats::rnorm(nrow(capture.prob), mean=logit(.06), sd=0.5) capture.dist <- ggplot(data=capture.prob, aes(x=expit(logit.tag)))+ ggtitle("Distribution of capture probabilities at tagging wheel")+ geom_histogram()+ xlab("Capture probability at the fish wheel") capture.dist  This is used to sample from the simulated run as it passes the wheel and the distribution of the number tagged is: # is this fish sampled? pop <- merge(pop, capture.prob, all.x=TRUE) pop$tagged <- as.logical(rbinom(nrow(pop), 1, expit(pop$logit.tag))) ggplot(data=pop[pop$tagged,], aes(x=date))+
ggtitle("Number tagged and released by date")+
geom_bar(alpha=0.2)+
xlab("Date")


A total of r sum(pop$tagged) fish are tagged and released. # travel time is log-normal with log(mean) of log(1 week) sd=.1 days travel.time.mu <- 7 travel.time.sigma <- .3 pop$travel.time <- rlnorm(nrow(pop), meanlog=log(travel.time.mu), sdlog=travel.time.sigma)

travel.dist <- ggplot(data=pop, aes(x=travel.time))+
ggtitle("Distributon of travel times between wheel and fishery")+
geom_histogram(alpha=0.2)+
xlab("Travel time between wheel and fishery (days)")


Travel time from the wheel to the commercial fishery is simulated using a log-normal distribution with a mean (on the log scale) of log(r travel.time.mu) days and a standard deviation on the log-scale of r travel.time.sigma. This gives a distribution of travel times of:

travel.dist


The travel time was added to the time of arrival at the fish wheels giving a distribution of time of arrival in fishery of

# arrival at fishery
pop$time.at.fishery <- pop$time.at.wheel + pop$travel.time pop$date.at.fishery <- trunc(pop$time.at.fishery) fishery.dist <- ggplot(data=pop, aes(x=date.at.fishery))+ ggtitle("Distribution of arrival time at commercial fishery")+ geom_histogram(alpha=0.2)+ xlab("Date") fishery.dist  # peformance of fishery fishery <- plyr::ddply(pop, "date.at.fishery", plyr::summarize, tot.fish.fishery = length(date)) # fishery runs for 3 days then off then on then off fishery$active <- as.logical( trunc((fishery$date.at.fishery-1)/3) %% 2) # fishery stops at certain part of the run run.cutoff <- 0.99 date.cutoff <- quantile(pop$date.at.fishery, prob=run.cutoff)
fishery$active[ fishery$date.at.fishery > date.cutoff] <- FALSE

# figure out if captured in fishery
pop <- merge(pop, fishery, all.x=TRUE)

# fishery probability  logit on logit(.15) sd .2 on log scale
# Here is a case where the probability of capture is independent
fishery.p <- .10
pop$logit.recover <- stats::rnorm( nrow(pop), mean=logit(.10), sd=.2) pop$logit.recover[ !pop$active] <- -10 # probability of zero when fishery not acting # add a dependency on run size similar to what happens at the fish wheels fishery.p <- .10 pop$logit.recover <-  stats::rnorm( nrow(pop), mean=pmin(logit(fishery.p), logit(1000/7/pop$tot.fish.fishery), na.rm=TRUE), sd=.2) pop$logit.recover[ !pop$active] <- -10 # probability of zero when fishery not acting fishery.prob <- ggplot(data=pop, aes(x=expit(logit.recover)))+ ggtitle("Distribution of catchability at fishery")+ geom_histogram(alpha=0.2)+ xlab("Probability of capture in fishery") correlaton.plot <- ggplot(data=pop, aes(x=expit(logit.tag), y=expit(logit.recover)))+ ggtitle("Correlation between tagging and recapture probability")+ geom_point()+ xlab("Probability of capture at tagging wheel")+ylab("Probability of capture in fishery") correlation.tag.recover <- cor(expit(pop$logit.tag), expit(pop$logit.recover)) relative.bias.petersen <- -correlation.tag.recover*sqrt( var(expit(pop$logit.tag))*var(expit(pop$logit.recover)))/ mean(expit(pop$logit.tag)* expit(pop$logit.recover)) pop$recover <- as.logical( rbinom(nrow(pop), 1, expit(pop$logit.recover))) fishery.catch <- ggplot(data=pop[pop$recover,], aes(x=date.at.fishery))+
ggtitle("Commercial catch by date")+
geom_bar(width=1, alpha=0.2)+
geom_vline(xintercept=date.cutoff, color="red")+
xlim(0,max(pop$date.at.fishery))+xlab("Date")  The distribution of catchability in the commercial fishery is fishery.prob  The commercial fishery is assumed to run on a 3 day on/3 day off schedule throughout the season and terminates when about r 100*run.cutoff% of the run has passed the fishery (day r round(date.cutoff)). If the catchability in the commercial fishery equal for all fish, then the pooled Petersen will also be unbiased. This is clearly not the case because some fish has a probability of 0 of being captured when the fishery is not operating. If the probability of capture in the commercial fishery is uncorrelated with the probability of capture by the tagging wheel, the pooled-Petersen is also unbiased. A plot of the probability of capture at the tagging wheels and in the commercial fishery is: correlaton.plot  In this case the correlation between the tagging and recovery probability is r round(correlation.tag.recover,2). Schwarz and Taylor (1988) give a formula for the relative bias of the pooled Petersen if you know the correlation and variation in the probability in the two events. In this case the relative bias of the Pooled Petersen is r round(100*relative.bias.petersen)%. A non-zero correlation could arise if both the fish wheel and commercial fishery can be saturated, e.g. regardless of the number of fish arriving at the fishwheel, only a maximum number can be captured and tagged, and regardless of how many fish are available in the fishery, only a maximum can be caught. In this case, the probability of tagging and the probability of recapture is reduced when there are many fish available which could induce some correlation. A summary of the catch by the fishery is: fishery.catch  Notice the "holes" in the data when the commercial fishery is not operating. A summary of the number of fish tagged and recaptured is: xtabs(~tagged+recover, data=pop)  The data were broken into 3 day strata to match the commercial fishery operations and gives rise to the following matrix of releases and recoveries: # break into 3 day strata for tagging and recovery pop$tag.stratum     <- pmax(1, 1+trunc((pop$date -1)/3)) pop$fishery.stratum <- pmax(1, 1+trunc((pop$date.at.fishery-1)/3)) range(pop$tag.stratum)


# References

Bonner Simon, J., & Schwarz Carl, J. (2011). Smoothing Population Size Estimates for Time-Stratified MarkRecapture Experiments Using Bayesian P-Splines. Biometrics, 67, 1498–1507. https://doi.org/10.1111/j.1541-0420.2011.01599.x

Darroch, J. N. (1961). The two-sample capture-recapture census when tagging and sampling are stratified. Biometrika, 48, 241–260. https://doi.org/10.1093/biomet/48.3-4.241

Plante, N., L.-P Rivest, and G. Tremblay. (1988). Stratified Capture-Recapture Estimation of the Size of a Closed Population. Biometrics 54, 47-60. https://doi.org/10.2307/2533994

Schwarz, C. J., & Taylor, C. G. (1998). The use of the stratified-Petersen estimator in fisheries management with an illustration of estimating the number of pink salmon (Oncorhynchus gorbuscha) that return to spawn in the Fraser River. Canadian Journal of Fisheries and Aquatic Sciences, 55, 281–296. https://doi.org/10.1139/f97-238

## Try the BTSPAS package in your browser

Any scripts or data that you put into this service are public.

BTSPAS documentation built on Oct. 25, 2021, 9:07 a.m.