Variance calculation differences between R and Distance for Windows"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

As some general background for the following we first set out some notation. We want to estimate $\hat{N}$, the abundance in the study area and its variance $\text{Var}\hat{N}$. We calculate $\hat{N}$ using a Horvitz-Thompson estimator: \begin{equation} \hat{N}=\frac{A}{a}\sum_{i=1}^{n}\frac{s_{i}}{\hat{p}{i}}, (#eq:HT) \end{equation} where $A$ is the area of the study area, $a$ is the covered area, $i$ indexes the observations, which have group size $s{i}$ and probability of detection $\hat{p}_{i}$. For line transects $a=2wL$ where $L$ is the total line length and for point transects $a=2\pi wK$ where $K$ is the total number of transects.

Following @buckland_introduction_2001 and @buckland_advanced_2004, we identify the following sources of uncertainty in \@ref(eq:HT):

So, when we have objects occurring individually, we can obtain the total variance of $\hat{N}$ as: $$\text{Var}\hat{N}=\hat{N}\sqrt{\left[\text{CV}{\boldsymbol{\theta}}\left(\hat{N}\right)\right]^{2}+\left[\text{CV}\left(n/L\right)\right]^{2}},$$ and when objects occur in groups: $$\text{Var}\hat{N}=\hat{N}\sqrt{\left[\text{CV}{\boldsymbol{\theta}}\left(\hat{N}\right)\right]^{2}+\left[\text{CV}\left(n/L\right)\right]^{2}+\left{ \text{CV}\left[\widehat{\mathbb{E}(s)}\right]\right} ^{2}}$$ Each of these is addressed in-turn below.

Before starting that, a few more definitions...

Per @buckland_introduction_2001 we call our detection function $g(x;\boldsymbol{\theta})$, where $x$are perpendicular distances from a line transect survey and $\boldsymbol{\theta}$ are parameters. For point transects we write $g(r;\boldsymbol{\theta})$, where $r$ are radial distances. From this we can define $\hat{p_{i}}$, which we write more fully as $\hat{p}(\mathbf{z}{i};\boldsymbol{\theta})$ -- the probability of seeing an animal with covariate combination $\mathbf{z}{i}$. We can calculate this as: $$ \hat{p}(\mathbf{z}{i};\boldsymbol{\theta})=\int{l}^{w}\pi(x)g(x;\boldsymbol{\theta})\text{d}x, $$ where $w$ and $l$ are the right and left truncation distances, respectively and $\pi(x)$ is the distribution of animals with respect to the sampler. For line transects $\pi(x)=1/w$ and for points $\pi(r)=2r/w^{2}$.

Variance of the detection function ($\hat{p_{i}}$)

To obtain the variance of $\hat{p_{i}}$, we first note that from the maximum likelihood procedure used to fit the detection function, we obtain the Hessian (matrix of second derivatives of the likelihood with respect to the parameters), which we denote $\mathbf{H}{\boldsymbol{\theta}}$[^1]. We can then note that $\mathbf{H}{\boldsymbol{\theta}}^{-1}$ is the covariance matrix of the detection function parameters, $\mathbf{V}{\hat{\boldsymbol{\theta}}}$. From this we can use the standard sandwich estimator technique and obtain expressions for the uncertainties for any derived quantities which are functions of $\boldsymbol{\theta}$. If our function of the parameters is $h(\boldsymbol{\theta})$, then we have: $$ \text{Var}{\boldsymbol{\theta}}h=\left(\frac{\partial h(\boldsymbol{\theta)}}{\partial\boldsymbol{\theta}}\right)^{\intercal}\mathbf{V}_{\hat{\boldsymbol{\theta}}}\left(\frac{\partial h(\boldsymbol{\theta)}}{\partial\boldsymbol{\theta}}\right). $$

So, for $\text{Var}\hat{N}$ we have [@buckland_introduction_2001 equation (3.25)]: $$ \text{Var}{\boldsymbol{\theta}}\hat{N}=\left(\frac{\partial\hat{N}}{\partial\boldsymbol{\theta}}\right)^{\intercal}\mathbf{V}{\hat{\boldsymbol{\theta}}}\left(\frac{\partial\hat{N}}{\partial\boldsymbol{\theta}}\right), $$ where $\hat{N}$ is implicitly a function of $\boldsymbol{\theta}$ and $\frac{\partial\hat{N}}{\partial\boldsymbol{\theta}}$ can be obtained by finite differencing in practice.

Variance of encounter rate ($n/L$ or $n/K$)

Calculating the encounter rate variance is the most complicated part of the process and has a large number of possible estimators depending on the situation in question.

When $A=a$ (and objects are not in clusters)

When we only wish to obtain an abundance estimate for the covered area and objects do not occur in clusters, we use the "binomial variance" estimator [@borchers_horvitz-thompson_1998 equation 13]: $$ \sum_{i=1}^{n}\frac{1-p(\mathbf{z}{i};\boldsymbol{\theta})}{p(\mathbf{z}{i};\boldsymbol{\theta})^{2}}. $$ In mrds, this behaviour can be obtained using varflag=0.

When there is only one sample

In the case where there is only one transect, we make a "Poisson assumption" (that $\text{Var}x=x$), so the variance is "known." In this case the encounter rate variance is just the sum of the estimated abundances in the study area.

When size is not in the detection function {#buckland}

Depending on whether group size is included as a covariate in the detection function we can use different estimators for the encounter rate and hence its variance. The "classic" @buckland_introduction_2001 estimator is the the encounter rate is $n/L$ or $n/K$. mrds documentation[^2] claims that using this estimator in variance calculations is not valid when group size is included.

In mrds, this behaviour can be obtained using varflag=1.

When size is in the detection function {#innes}

@innes_surveys_2002 propose using $\hat{N}/L$ or $\hat{N}/K$ for the estimator when group size is included as a covariate in the detection function.mrds documentation[^3] claims that this estimator is better when covariates are included in the detection function.

In mrds, this behaviour can be obtained using varflag=2.

Fewster et al (2009) estimators

Having decided whether to use $n/L$ or $n/K$, or $\hat{N}/L$ or $\hat{N}/K$, we can then plug them into the estimators offered by @fewster_estimating_2009[^4].

Variance of group size ($s_{i}$)

If we use the method of @innes_surveys_2002 outlined in Section \@ref(innes), group size variation is already accounted for, as $\hat{N}/L$ will include group size information. However, if we use the method of @buckland_introduction_2001 given in Section \@ref(buckland) then we need to include the empirical variance of the group sizes.

Stratification

The above all applies when the total study area is used. When we wish to obtain stratified estimates, we sum over the appropriate indices (objects in the given stratum, line lengths in the given stratum, number of points in a given stratum etc). This will give us an estimate for each stratum. To obtain totals we need to sum coefficients of variation.

Constructing intervals

To obtain confidence intervals around estimates of abundance, we can calculate $\left(\hat{N}/C,\hat{N}C\right)$ where $$ C=\exp\left[t_{m}(\alpha)\sqrt{\widehat{\text{Var}}(\log_{e}\hat{N})}\right],$$ and $$\widehat{\text{Var}}(\log_{e}\hat{N})=\log_{e}\left[1+\frac{\widehat{\text{Var}}\left(\hat{N}\right)}{\hat{N}^{2}}\right]=\log_{e}\left[1+\text{CV}\left(\hat{N}\right)^{2}\right]. $$ We find the degrees of freedom for the $t$-distribution, $m$, by the following formula [@buckland_introduction_2001 equation (3.75)]: $$ m=\frac{\left[\text{CV}\left(\hat{N}\right)\right]^{4}}{\sum_{k=1}^{K}\left(\text{CV}{k}\right)^{4}/m{k}}, $$ where $\text{CV}{k}$ and $m{k}$ are the CV and degrees of freedom for component $k$ (e.g., $k=1$ is the detection function, $k=2$ is the encounter rate, etc), respectively. $$$$

Current mrds and Distance behaviour

By default mrds and Distance will use the following options:

References

[^1]: Well, we actually use the product of the first partials to calculate this, per @buckland_introduction_2001 equation (3.24)

[^2]: See ?mrds::dht.se.

[^3]: See ?mrds::dht.se.

[^4]: See ?mrds::varn for more information on those options offered in mrds.



Try the mrds package in your browser

Any scripts or data that you put into this service are public.

mrds documentation built on July 9, 2023, 6:06 p.m.