knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(fishCompTools)

Introduction

This document will walk-through the structure of the input and output of the wrapper functions for the MCMC estimation routine.

First, generate some data and the input

For multiple strata, the inputs and results and lists with each strata being an entry, and the structure of each entry being the same. So, here we will only consider one strata.

#### generate the data

# define the unknown "true" proportions at which fish in each
# PBT group assign to the different GSI groups
pbtGSImat <- matrix(c(.1, .8, .1,.8, .1, .1,.1, .1, .8), nrow = 3, ncol = 3, byrow = TRUE)

#define the "true" proportions of each group that are male and female
# we will have a moderate difference between the PBT and wild groups
varMat <- list(
    matrix(c(rep(c(.4,.6), 3), rep(c(.6,.4), 3)), nrow = 6, ncol = 2, byrow = TRUE)
)

# now generate three strata from identical parameters
multStratData <- data.frame()
tempData <- generatePBTGSIdata(sampRate = .2, censusSize = 3000, relSizePBTgroups = c(1,2,3), tagRates = c(.8, .85,.9), physTagRates = 0, true_clipped = 0, true_noclip_H = .4, true_wild = .6, relSizeGSIgroups = c(1,2,1), PBT_GSI_calls = pbtGSImat, varMatList = varMat)
multStratData <- tempData[[1]]
multStratData$Strata <- 1
tags <- tempData[[2]] # get the tag rates for the PBT groups

multStratData$GSI <- paste0("GSIgroup_", multStratData$GSI)
multStratData$Var1[multStratData$Var1 == 1] <- "F"
multStratData$Var1[multStratData$Var1 == 2] <- "M"
colnames(multStratData)[colnames(multStratData) == "Var1"] <- "Sex"

#### now prepare the input
mainInput <- prepStrata(multStratData, tags, GSIcol = "GSI", PBTcol = "GenParentHatchery", strataCol = "Strata", adFinCol = "AdClip", AI = TRUE, variableCols = c("Sex"))

#### lets look at the structure
str(mainInput)

A word of caution: Internally, the order of the entries in the various inputs are used. It is vital that if you modify the inputs manually, you do not change the ordering of the values. Dimension names are NOT used by the functions, they are only printed to make it easier for the user to edit the priors as they see fit.

Let's get the simple ones out of the way:

And now, let's look at the one's you may want to modify, the priors:

mainInput[[1]]$piTotPrior

piTotPrior is a named vector with the alphas of the prior Dirichlet distribution for the composition of the strata. The names are not used internally, they are supplied to make editing this prior easier for the user. The PBT groups are listed first, and the wild groups (by GSI assignment) are listed second.

mainInput[[1]]$prior_pi_gsi

prior_pi_gsi is a matrix with rows correspondign to PBT groups and columns corresponding to GSI assignment categories. Each PBT group has a Dirichlet prior and the values in a row are the alphas for this prior.

Now, let's go over the inputs related to variables other than PBT and GSI assignments that are used by the algorithm to estimate composition. These first three are lists, with the number of entries corresponding to the number of variables.

mainInput[[1]]$pi_Vprior

pi_Vprior is a matrix with rows corresponding to the groups (PBT and wild GSI) and columns corresponding to the categories of the variable. Each group has a Dirichlet prior and the values in a row are the alphas for this prior.

head(mainInput[[1]]$v_ut)

v_ut is a matrix with each variable being a column and each row being a PBT-unassigned sample. The values correspond to the category that a given sample belongs to. Missing data is given a value of -9.

The inputs:

all correspond to the inputs named the same but without the "Oth" suffix except that they are for variables whose composition you want to estiamte, but you do not want to use to inform the other estimates. A typical reason for doing this would be that you want to estimate the compostion for two variables (say age and sex), but these variables are not independent of each other. So, you can specify one of them with variableCols and the others with variableColsOth when you are building your input.

And now, the remaining inputs: groupsKey, GSIkey, variKey, variKeyOth: these all show the correspondence between the integers used internally to represent various values and the actual values used in the input dataset. AI: This is TRUE is the inputs represent ad-intact fish and false if they represent ad-clipped fish. This is not used when runnign the MCMC sampler, but it is used by downstream wrapper scripts that combine the results of the MCMC sampler with estiamtes of the proportions of fish that are clipped and unclipped. * strataName: The name of the strata that the samples represented by this input are from

Now, run the model and look at the structure of the results

propEstimates <- estimStrataMCpbt(mainInput, iter = 3000, burnIn = 500, thin = 1, seed = 7)
str(propEstimates)

The results are a list with an entry for each strata. piV and piVOth and lists with an entry for each variable. Each entry is a matrix, with rows being iterations of the MCMC sampler, and columns being different quantities being estimated.

This may be all you need, if you are just interested in the proportions. However, if you want to multiply by a total population size and get counts, we can do that:

nPost <- nrow(propEstimates[[1]]$piTot)
popSizeEstimates <- list() # a list with one entry per strata
popSizeEstimates <- list(rep(3000, nPost))

countEstimates <- multByEscapement(mainInput, mainRes = propEstimates, popSizes = popSizeEstimates, writeSummary = FALSE)
str(countEstimates)

You'll first see there is a big list called strataEstimates. This has estimates for each strata, and the structure of each entry is analagous to the other items in the list, which are detailed below:

head(countEstimates$totClipUnclip)

totClipUnclip is the number of fish that are ad-clipped and ad-intact, with rows being iterations of the MCMC sampler. Here, all of our fish are ad-intact.

head(countEstimates$totPiTotEstim)

totPiTotEstim is the number of fish belonging to each group.

head(countEstimates$totGSIestim)

totGSIestim is the number of fish belonging to each group and having a given GSI assignment.

head(countEstimates$totpiVestim[[1]])

The entries of totpiVestim are matrices with the number of fish belonging to each group and given category of the variable you are fitting.



delomast/fishCompTools documentation built on Jan. 11, 2021, 1:51 a.m.