knitr::opts_chunk$set(echo=TRUE, eval=TRUE, tidy.opts=list(width.cutoff=50), tidy=TRUE, warning=FALSE, message=FALSE, fig.align='center', size="footnotesize", fig.width = 6, fig.height = 3)

This is a short explanation for the implementation and results from the caribouSim package. The purpose of this project is to simulate radio-collared caribou herds with the goal of investigating the properties of competing hypothesis tests for the assumption that radio-collared animals mix evenly within the wider herd. This assumption is critical for the caribou population estimator proposed by Rivest et al. (1998).


sim_Ni()

First, we use sim_Ni to simulate the distribution of a caribou herd amongst a random number of groups. This function has one parameter, Theta, which is the approximate size of the desired herd.

library(caribouSim)
Theta <- 1000000  # herd size will be about a million caribou
set.seed(123)
Ni <- sim_Ni(Theta = Theta)
Ni

So in this example, there are r length(Ni) groups of animals ranging in size from r min(Ni) to r max(Ni) that make up this herd. Note that the realized size of this herd is not equal to Theta - it's more of a order-of-magnitude for the desired herd.

sum(Ni)

sim_Xi()

Next, we can generate collared caribou using sim_Xi. This function returns a vector of the number of collared animals, with order corresponding to the groups generated by sim_Ni. There are three arguments: nu is the desired total number of collars, Ni is a vector generated by sim_Ni, and phi is a parameter that controls the degree of clustering of collared animals. phi ranges from 0 (no clustering) to 0.5 (high amount of clustering).

nu <- 100 # want ~100 collared animals in the herd
Xi <- sim_Xi(nu = nu, Ni = Ni, phi = 0)  # no clustering = random mixing assumption met
Xi

sample_phaseI()

Of course, biologists searching for caribou herds will not find groups without collared animals. And in fact, the Rivest estimator does not allow for groups with 0 collars. So the dataset available to biologists for sampling is

pop <- sample_phaseI(Xi = Xi, Ni = Ni)
knitr::kable(pop)

sample_phaseII()

The processes that result in real collared caribou herds thus far is referred to by Rivest et al. (1998) as "phase I" sampling. But biologists searching for caribou herds with collared animals may not detect every collar, thus arising the need for a model of what is referred to as "phase II" sampling. More explicitly, "phase II" sampling is the actual detection of collars. Rivest et al. (1998) suggest 3 natural models - homogeneity (H), independence (I), and threshold (T). See the paper for more details.

The function sample_phaseII has four arguments. The first, pop is simply the data generated in phase I sampling. The r parameter is related to the probability of detecting collars, and the user can choose their sampling model out of the 3 listed above. If T is chosen, a integer bound B must be specified.

dat <- sample_phaseII(pop = pop, r = 0.9, model = "I")
knitr::kable(dat)

In this case, when r = 0.9, the biologists missed the last herd which had r as.integer(pop[nrow(pop), 2]) animals in it. At the end of the sampling effort, they will know that they are missing r as.integer(pop[nrow(pop), 1]) collars.


Three hypothesis tests

At the moment I'm using three hypothesis tests that are already coded up in base r or in some package.

fisher's exact test

Self explanatory. Here it is implemented with the example dataset we've already created.

fisher.test(as.matrix(dat), simulate.p.value = TRUE)

At the α = 0.05 level we fail to reject the null hypothesis that collared caribou are mixing randomly with the wider herd. This is good since the clustering parameter, φ, was set to 0 and thus the null hypothesis is the truth.

AER::dispersiontest()

The function dispersiontest from the AER library takes a poisson GLM as input and tests for under or overdispersion. I'm fuzzy on the details here...

AER::dispersiontest(glm(Xi ~ Ni, family = poisson(), data = dat))

Again we've failed to reject the null hypothesis.

VGAM::vglm()

And finally, the VGAM package can zero truncated negative binomial models. By fitting a model and looking at its summary I can perform a wald test on the significance of the dispersion parameter.

library(VGAM)
summaryvglm(vglm(Xi ~ Ni, family = posnegbinomial(), data = dat))

Again reject the null hypothesis.


iterate()

The function iterate is the workhorse of the caribouSim package. All of the functions discussed up until now are implemented inside iterate, and this is really the only function we need to use when interacting with this library. This function iterates caribou population simulation a number of times specified by the argument iters, and computes the abundance estimate as well as p-values for each test on each iteration. It also returns an overall rejection rate for each test and some other potentially interesting data. iterate takes the arguments Theta, nu, phi, and r and passes them to the functions already discussed. The argument modelSim specifies the phase II probability detection model used to generate the data, whereas modelEst specifies which such model is used to compute the rivest abundance estimate.


Results

In this section I show some results from simulations run by iterate. I ran 6 loops with 100 simulations in each loop, with each loop increasing in the number of collars being circulated in the caribou population. Each seperate loop used a different value of φ. That is, in one simulation the random mixing assumption is met, and in the rest of the simulations it is broken to varying degrees.

The following code shows an example of one loop (not run).

Theta <- 1000000
nu <- seq(.00001*Theta, .001*Theta, by = 10)
iters <- 100

results <- vector(mode = "list", length = length(nu))
rejections <- matrix(nrow = length(nu), ncol = 3)
for(i in seq(nu)){
  tmp <- iterate(iters = iters, Theta = Theta, nu = nu[i], phi = 0, r = .5, modelSim = "I")
  print(tmp)
  results[[i]] <- cbind(i, tmp$collars, tmp$abundance, tmp$p_values, tmp$rhat)
  rejections[i,] <- tmp$reject_rate
}

write.csv(do.call(rbind, results), "simulations/results_phi0.csv")
write.csv(rejections, "simulations/rejections_phi0.csv")

power analysis

The following plots (made with the wrapper function plot_sim) show the number of rejections as a function of the number of collars mixing in the population.

φ = 0.0

nu <- seq(.00001*Theta, .001*Theta, by = 10)
rejections_phi0 <- read.csv("../../simulations/rejections_phi0.csv")
plot_sim(dat = rejections_phi0, nu = nu)

φ = 0.1

rejections_phi1 <- read.csv("../../simulations/rejections_phi1.csv")
plot_sim(dat = rejections_phi1, nu = nu)

φ = 0.2

rejections_phi2 <- read.csv("../../simulations/rejections_phi2.csv")
plot_sim(dat = rejections_phi2, nu = nu)

φ = 0.3

rejections_phi3 <- read.csv("../../simulations/rejections_phi3.csv")
plot_sim(dat = rejections_phi3, nu = nu)

φ = 0.4

rejections_phi4 <- read.csv("../../simulations/rejections_phi4.csv")
plot_sim(dat = rejections_phi4, nu = nu)

φ = 0.5

rejections_phi5 <- read.csv("../../simulations/rejections_phi5.csv")
plot_sim(dat = rejections_phi5, nu = nu)


chrisroberts2112/caribouSim documentation built on May 8, 2022, 12:05 a.m.