knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(fishCompTools)
This document will walk through how to estimate composition of a mixture of all ad-clipped fish by PBT group under the assumption that all fish in your sample belong to one of your PBT groups. In other words, the "true" number of fish that don't belong to one of your PBT groups is 0. This example uses GSI assignments, but if you do not calculate GSI assignments for fish, you can create a dummy variable that gives all fish (including PBT-assigned fish) the same GSI assignment.
We'll be assuming these fish are all ad-clipped, but a similar procedure can be carried out for ad-intact as well by adjust the appropriate options.
First, we will generate some data using the built in data-simulation function.
set.seed(130) #make generated data reproducible # define the unknown "true" proportions at which fish in each # PBT group assign to the different GSI groups pbtGSImat <- matrix(c(.1, .8, .1,.8, .1, .1,.1, .1, .8), nrow = 3, ncol = 3, byrow = TRUE) # now generate three strata from identical parameters multStratData <- data.frame() for(i in 1:3){ tempData <- generatePBTGSIdata(sampRate = .2, censusSize = 3000, relSizePBTgroups = c(1,2,3), tagRates = c(.8, .85,.9), physTagRates = 0, true_clipped = 0, true_noclip_H = 1, true_wild = 0, relSizeGSIgroups = c(1,2,1), PBT_GSI_calls = pbtGSImat) trapData <- tempData[[1]] trapData$Strata <- i multStratData <- rbind(multStratData, trapData) } tags <- tempData[[2]] # get the tag rates for the PBT groups multStratData$GSI <- paste0("GSIgroup_", multStratData$GSI) multStratData$AdClip <- "AD" #flip from unclipped to clipped head(multStratData)
Since you have actual data, if it's in the format you would use for scobi_deux
, you can load it in using the below example commands:
#tag rates tags <- read.csv("PBT tag rates.csv", stringsAsFactors = FALSE) tags <- tags[tags[,1] != "Unassigned",] #remove Unassigned from tag rate dataframe #trap data multStratData <- read.csv("2018RearHscobi.csv", stringsAsFactors = FALSE) #window counts window <- read.csv("Clip_window.csv", stringsAsFactors = FALSE) # The MCpbt function expects a strata column in the trap data. So next, we add it using the window count file. ### NOTE this code assumes you have a column named "WeekNumber" multStratData$Strata <- NA #initialize column for(i in 1:nrow(window)){ multStratData$Strata[multStratData$WeekNumber == window[i,1]] <- window[i,3] }
There are a few different ways you can have the function determine what the GSI assignment composition of PBT groups is. First decide whether to include data from other strata or not:
Now we need to prepare the data in the way the estimation function can use it. We will do this using a function in the package that takes the two dataframes we created above as input.
## Not run in this vignette #### important note: symPrior=0, this will make modifying the input in the next step easier # note that we are using AI=FALSE because we are looking at the composition # of ad-clipped fish mainInput <- prepStrata(multStratData, tags, GSIcol = "GSI", PBTcol = "GenParentHatchery", strataCol = "Strata", adFinCol = "AdClip", AI = FALSE, symPrior = 0)
#### important note: symPrior=0, this will make modifying the input in the next step easier # note that we are using AI=FALSE because we are looking at the composition # of ad-clipped fish mainInput <- prepStrata(multStratData, tags, GSIcol = "GSI", PBTcol = "GenParentHatchery", strataCol = "Strata", adFinCol = "AdClip", AI = FALSE, symPrior = 0, GSIgroups = sort(unique(multStratData$GSI))) # now we are combining information about GSI assignments of PBT groups across strata gsiCats <- sort(unique(multStratData$GSI)) pbtCats <- sort(unique(multStratData$GenParentHatchery)) pbtCats <- pbtCats[pbtCats != "Unassigned"] gsiAllStrata <- matrix(0, nrow = length(pbtCats), ncol = length(gsiCats)) colnames(gsiAllStrata) <- gsiCats rownames(gsiAllStrata) <- pbtCats ## here we are counting GSI assignments of all PBT-assigned fish in the dataset for(p in pbtCats){ for(g in gsiCats){ gsiAllStrata[p,g] <- sum(multStratData$GenParentHatchery == p & multStratData$GSI == g) } } #here we are incorporation the assignemnts from other strata as a prior for(i in 1:length(mainInput)){ p <- rownames(mainInput[[i]]$prior_pi_gsi) g <- colnames(mainInput[[i]]$prior_pi_gsi) mainInput[[i]]$prior_pi_gsi <- gsiAllStrata[p,g] - mainInput[[i]]$ohnc_gsi }
Now, we have to decide whether we want to consider that fish in a given PBT group may have GSI assignments that were not observed amongst the fish PBT-assigned to that group in our dataset. If we do, we can modify the prior to allow this. Below, it is shown how you can add 1/N to the alpha values of the relevant Dirichlet prior to accomplish this (where N is the number of GSI categories). Values other than 1/N can be added. The effect of using alternative values is beyond the scope of this vignette.
## NOT run in this vignette for(i in 1:length(mainInput)){ ng <- ncol(mainInput[[i]]$prior_pi_gsi) mainInput[[i]]$prior_pi_gsi <- mainInput[[i]]$prior_pi_gsi + (1/ng) }
This has created an input list that has one item for each strata.
To run the estimates the way we want, we have to modify the prior and initial values for each strata. Essentially, we will tell it that all the GSIgroup_#'s are 0. This means any Unassigned fish must be untagged fish belonging to one of the PBT groups.
We set symPrior=0
earlier, which takes care of the prior, now we have to modify the initial values. We are setting them so that the initial values for all the "Unassigned" groups are 0.
for(i in 1:length(mainInput)){ pbtGroups <- mainInput[[i]]$groups[1:mainInput[[i]]$nPBT] mainInput[[i]]$piTotInitial <- mainInput[[i]]$ohnc / sum(mainInput[[i]]$ohnc) for(j in 1:length(mainInput[[i]]$initZ)){ tempGroups <- mainInput[[i]]$ohnc_gsi[,mainInput[[i]]$gsiUT[j]] + mainInput[[i]]$prior_pi_gsi[,mainInput[[i]]$gsiUT[j]] tempGroups <- which(tempGroups > 0) if(length(tempGroups) == 0) print("Error, GSI assignment not compatible with the prior or observed PBT-assigned fish") mainInput[[i]]$initZ[j] <- sample(tempGroups, 1) } mainInput[[i]]$oUTInitial <- rep(0, length(mainInput[[i]]$oUTInitial)) for(g in 1:length(pbtGroups)){ mainInput[[i]]$oUTInitial[g] <- sum(mainInput[[i]]$initZ == pbtGroups[g]) } }
Everything is now set up, so...
# specifying a seed to make the anlaysis reproducible # running 10000 iterations with 1000 burn-in and no thinning propEstimates <- estimStrataMCpbt(mainInput, iter = 10000, burnIn = 1000, thin = 1, seed = 7)
We now have estimates in terms of proportions. You can look at convergence and mixing of the chain here if you want. One simple option is just to look at the trace of each groups' population size in each strata:
## This will create a pdf file with plots for you to look through pdf("traces_to_check.pdf", width = 11, height = 8.5) for(i in 1:length(propEstimates)){ tempPiTot <- propEstimates[[i]]$piTot nPBT <- mainInput[[i]]$nPBT for(j in 1:nPBT){ plot(tempPiTot[,j], type = "l", cex.axis = 2) } } dev.off()
We can now multiply by our population size to turn these into numbers of fish.
The function is set up to allow a user to propagate uncertainty in the escapement estimates. Since at BONN it is assumed the window count is the population size, we will just give it the same number over and over. This block of code takes the window count file (in scobi_deux format) and generates the window count numbers in the format the function needs.
window <- data.frame(weekNumber = c(1,2,3), count = c(3000,3000,3000), strata = c(1,2,3))
#make the window count input required numMCMC <- nrow(propEstimates[[1]]$piTot) # this gets the number of MCMC samples we took strataOrder <- sapply(mainInput, function(x) return(x$strataName)) escapementByStrata <- list() for(s in strataOrder){ total <- sum(window[window[,3] == s,2]) escapementByStrata[[s]] <- rep(total, numMCMC) }
Then, we use the built in function to multiply everything by the window count.
countEstimates <- multByEscapement(mainInput, mainRes = propEstimates, popSizes = escapementByStrata, writeSummary = FALSE)
What we are really interested in is just one entry of countEstimates
, so let's pull it out and name it separately for ease.
runEstimate <- countEstimates$totPiTotEstim head(runEstimate)
runEstimate
has the estimated number of fish in each group for the total of the run in each iteration of the MCMC sampler that we saved. Notice that the GSIgroup_# are all 0, because we constrained them to be as such. Now, if you just want numbers and CIs for each hatchery, you can:
#point estimates print("Point Estimates") apply(runEstimate, 2, mean) #CIs print("CIs") alpha <- .1 apply(runEstimate, 2, quantile, c(alpha/2, 1-(alpha/2)))
If you want to combine specific hatcheries into larger groups, you just add columns. For example, to get a point estimate and CI for pbtGroup1 + pbtGroup2:
#point estimates print("Point Estimate") mean(runEstimate[,"pbtGroup1"] + runEstimate[,"pbtGroup2"]) #CIs print("CI") alpha <- .1 quantile(runEstimate[,"pbtGroup1"] + runEstimate[,"pbtGroup2"], c(alpha, 1-(alpha/2)))
Adding columns manually can be pretty tedious if you have a lot of combining to do. So, let me know if that is the case, and I can help put together a block of code that will take an input file showing which groups to combine and automate the process.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.