In dill/ltdesigntester: Test Performance of Line Transect Designs

library(knitr)
opts_chunk$set(cache = TRUE)

Introduction

ltdesigntester is an R package to test basic survey designs and highlight potential issues with using simple estimators (such as Horvitz-Thompson-like estimators; henceforth HT) when animal distribution changes in space.

The package takes a given setup of animal density, design and sighting paramters and generates multiple possible results for a given design. From there a series of different possible models are fitted to the data and relevant statistics recorded.

Requirements

The following things are required to build a simulation:

study area shapefile: defining the area in which the study takes place.
density surface: giving "true" the distribution of the animals in the study area.
population size: "true" number of animals in the study area.
survey design shapefile: the lines to be surveyed, already segmented. Additional list of correspondence of transects to segments is required.
detection function definition: describing the detectability and truncation for the survey.
stratification scheme: for the HT estimator.

A simple example

Here is a simple example of how to setup and run a simulation. This could be used as a template for an analysis.

First we need to load up the required R packages (note we use suppressPackageStartupMessages to minimize output to this document):

suppressPackageStartupMessages(library(DSsim))
suppressPackageStartupMessages(library(dsm))
suppressPackageStartupMessages(library(Distance))
suppressPackageStartupMessages(library(ltdesigntester))
suppressPackageStartupMessages(library(ggplot2))

We then need to setup the ingredients for our simulation as above.

Study area shapefile

We require the path to the data file in the shapefile directory. In this case the file is stored in the ltdesigntester package, so we need to use system.file to get its location:

studyarea_shapefile <- paste0(system.file("shapes/region2/", package="ltdesigntester"),"data")

Density surface

n_grid_x <- 300
n_grid_y <- 100
density.surface <- expand.grid(x = seq(0, 3, len=n_grid_x),
                               y = seq(0, 1, len=n_grid_y))
density.surface$density <- 5*(density.surface$x/3)

Population size

Setting the population size is straightforward, in this case, we choose 500 animals:

true_N <- 500

Survey design shapefile

We only require the path to the shapefile directory we need. In this case the file is stored in the ltdesigntester package, so we need to use system.file to get its location:

design_shapefile <- system.file("shapes/manyzigzags", package="ltdesigntester")

We also define a vector that is the same length as the number of segments in the design that allocates them to transects. These are use for the sample units in the Horvtiz-Thompson estimates.

transects <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
               2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4,
               4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5,
               6, 6, 6, 6, 6, 6, 6, 6, 6)

Detection function definition

Setting up our detection function, we simply create a list with the relevant components. Here we setup a hazard-rate detection function ("hr"), with a scale parameter of 0.025 and shape parameter of 3, and truncate our survey at 0.05 (this is the strip half-width).

df <- list(key        = "hr",
           scale      = 0.025,
           shape      = 3,
           truncation = 0.05)

Stratification scheme

Only a simple stratification is used here. At points on the x axis the survey will be stratified. The extremes of the survey area are used as end points, so the strata are [0,1), [1, 2), [2,3].

stratification <- c(1, 2)

Prediction grid

Now setting up a simple prediction grid using the expand.grid function in R. For more complex boundaries, the inSide function in mgcv can be useful.

cell_side_x <- 0.05
cell_side_y <- 0.05/3
pred_dat <- expand.grid(x = seq(0, 3, by=cell_side_x),
                        y = seq(0, 1, by=cell_side_y))
# set the grid cell size
pred_dat$off.set <- cell_side_x*cell_side_y

We include a test using a rotated coordinate system in the simulation so an additional coordinate system (xr, yr) need to be added to the prediction data.

# rotation matrix
R <- matrix(c(cos(pi/4), sin(pi/4), -sin(pi/4), cos(pi/4)),2,2)
# rotate predictions
pred_dat[,c("xr","yr")] <- t(R %*% t(pred_dat[,c("x","y")]))

Building the simulation specification

We put the parts we defined above together using build_sim:

ss <- build_sim(design_path = design_shapefile,
                dsurf=density.surface,
                n_grid_x=n_grid_x, n_grid_y=n_grid_y,
                n_pop=true_N, df=df,
                region=studyarea_shapefile)

To check that our setup is correct we can do a dummy run and plot the output in a simple way:

check_sim_setup(ss)

Running a simulation

To save time here, we simply run 10 replicates, using the setup we defined above (ss) using the stratification and transect definitions from above.

big_res <- do_sim(10, ss, pred_dat, stratification,
                  transect_id=transects)

Analysing results

The simulation results are formatted as a data.frame, with one row per model result. We can look at the format here:

head(big_res)

Columns are as follows:

model - identifier for the model
iter - which simulation the results are from
quantile - which quantile the true abundance lies in within the distribution of the estimated abundance
Nhat - the estimated abundance
cvN - the coefficient of variation of the estimated abundance
n - the number of animals observed
sp1 - first smoothing paramater for spatial models
sp2 - second smoothing parameter for the spatial models

Here we present two useful diagnostic plots using ggplot2:

Bias

An easy diagnostic is the bias in the estimates of abundance. We first format the model names nicely:

# lookup for model names to short descriptions
model_name_lookup <- list("HT" = "Horvitz-Thompson",
                          "HT_strat" = "Horvitz-Thompson (stratified)",
                          "m_xy_ds"  = "Duchon spline bivariate",
                          "m_xy_te"  = "Thin plate tensor product",
                          "m_xy_tp"  = "Thin plate bivariate",
                          "m_xy_ts"  = "Thin plate with shrinkage bivariate",
                          "m_xyr_tp"  = "Thin plate bivariate rotated",
                          "m_xyr_te"  = "Thin plate tensor product rotated"
                          )
big_res$name <- unlist(model_name_lookup[big_res$model])
big_res$bias <- big_res$N - true_N
p <- ggplot(big_res)+
       geom_boxplot(aes(name, bias)) +
       labs(y="Bias", x="Model") +
       coord_cartesian(ylim=c(-200,300)) +
       theme_minimal() +
       theme(axis.text.x = element_text(angle = 45, size=8)) +
       geom_hline(aes(yintercept=0), colour="red")
print(p)

Quantile confidence

We can also investigate where the true abundance lies within the estimated distribution of abundance from the model. If the distribution of the statistic is skewed to either end then we can infer under or over estimation of abundance for a particular estimator. A flat distribution shows good performance, whereas a "dome" in the middle indicates a conservative estimate in the sense that confidence intervals are slightly too wide.

These are less illustrative in this case, as we only have 10 replicates.

p <- ggplot(big_res)+
       geom_histogram(aes(quantile))+
       facet_wrap(~name, nrow=2)+
       geom_vline(xintercept=0.5) +
       labs(y="Count", x="Quantile") +
       theme(strip.text = element_text(size=4)) +
       theme_minimal()
print(p)

dill/ltdesigntester documentation built on May 15, 2019, 8:30 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dill/ltdesigntester
Test Performance of Line Transect Designs

In dill/ltdesigntester: Test Performance of Line Transect Designs

Introduction

Requirements

A simple example

Study area shapefile

Density surface

Population size

Survey design shapefile

Detection function definition

Stratification scheme

Prediction grid

Building the simulation specification

Running a simulation

Analysing results

Bias

Quantile confidence

R Package Documentation

Browse R Packages

We want your feedback!

dill/ltdesigntester Test Performance of Line Transect Designs

In dill/ltdesigntester: Test Performance of Line Transect Designs

Introduction

Requirements

A simple example

Study area shapefile

Density surface

Population size

Survey design shapefile

Detection function definition

Stratification scheme

Prediction grid

Building the simulation specification

Running a simulation

Analysing results

Bias

Quantile confidence

R Package Documentation

Browse R Packages

We want your feedback!

dill/ltdesigntester
Test Performance of Line Transect Designs