library(knitr) opts_chunk$set(cache = TRUE)
ltdesigntester
is an R package to test basic survey designs and highlight potential issues with using simple estimators (such as Horvitz-Thompson-like estimators; henceforth HT) when animal distribution changes in space.
The package takes a given setup of animal density, design and sighting paramters and generates multiple possible results for a given design. From there a series of different possible models are fitted to the data and relevant statistics recorded.
The following things are required to build a simulation:
Here is a simple example of how to setup and run a simulation. This could be used as a template for an analysis.
First we need to load up the required R packages (note we use suppressPackageStartupMessages
to minimize output to this document):
suppressPackageStartupMessages(library(DSsim)) suppressPackageStartupMessages(library(dsm)) suppressPackageStartupMessages(library(Distance)) suppressPackageStartupMessages(library(ltdesigntester)) suppressPackageStartupMessages(library(ggplot2))
We then need to setup the ingredients for our simulation as above.
We require the path to the data file in the shapefile directory. In this case the file is stored in the ltdesigntester
package, so we need to use system.file
to get its location:
studyarea_shapefile <- paste0(system.file("shapes/region2/", package="ltdesigntester"),"data")
n_grid_x <- 300 n_grid_y <- 100 density.surface <- expand.grid(x = seq(0, 3, len=n_grid_x), y = seq(0, 1, len=n_grid_y)) density.surface$density <- 5*(density.surface$x/3)
Setting the population size is straightforward, in this case, we choose 500 animals:
true_N <- 500
We only require the path to the shapefile directory we need. In this case the file is stored in the ltdesigntester
package, so we need to use system.file
to get its location:
design_shapefile <- system.file("shapes/manyzigzags", package="ltdesigntester")
We also define a vector that is the same length as the number of segments in the design that allocates them to transects. These are use for the sample units in the Horvtiz-Thompson estimates.
transects <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6)
Setting up our detection function, we simply create a list with the relevant components. Here we setup a hazard-rate detection function ("hr"
), with a scale parameter of 0.025
and shape parameter of 3
, and truncate our survey at 0.05
(this is the strip half-width).
df <- list(key = "hr", scale = 0.025, shape = 3, truncation = 0.05)
Only a simple stratification is used here. At points on the x
axis the survey will be stratified. The extremes of the survey area are used as end points, so the strata are [0,1), [1, 2), [2,3].
stratification <- c(1, 2)
Now setting up a simple prediction grid using the expand.grid
function in R. For more complex boundaries, the inSide
function in mgcv
can be useful.
cell_side_x <- 0.05 cell_side_y <- 0.05/3 pred_dat <- expand.grid(x = seq(0, 3, by=cell_side_x), y = seq(0, 1, by=cell_side_y)) # set the grid cell size pred_dat$off.set <- cell_side_x*cell_side_y
We include a test using a rotated coordinate system in the simulation so an additional coordinate system (xr
, yr
) need to be added to the prediction data.
# rotation matrix R <- matrix(c(cos(pi/4), sin(pi/4), -sin(pi/4), cos(pi/4)),2,2) # rotate predictions pred_dat[,c("xr","yr")] <- t(R %*% t(pred_dat[,c("x","y")]))
We put the parts we defined above together using build_sim
:
ss <- build_sim(design_path = design_shapefile, dsurf=density.surface, n_grid_x=n_grid_x, n_grid_y=n_grid_y, n_pop=true_N, df=df, region=studyarea_shapefile)
To check that our setup is correct we can do a dummy run and plot the output in a simple way:
check_sim_setup(ss)
To save time here, we simply run 10 replicates, using the setup we defined above (ss
) using the stratification and transect definitions from above.
big_res <- do_sim(10, ss, pred_dat, stratification, transect_id=transects)
The simulation results are formatted as a data.frame
, with one row per model result. We can look at the format here:
head(big_res)
Columns are as follows:
model
- identifier for the modeliter
- which simulation the results are fromquantile
- which quantile the true abundance lies in within the distribution of the estimated abundanceNhat
- the estimated abundancecvN
- the coefficient of variation of the estimated abundancen
- the number of animals observedsp1
- first smoothing paramater for spatial modelssp2
- second smoothing parameter for the spatial modelsHere we present two useful diagnostic plots using ggplot2
:
An easy diagnostic is the bias in the estimates of abundance. We first format the model names nicely:
# lookup for model names to short descriptions model_name_lookup <- list("HT" = "Horvitz-Thompson", "HT_strat" = "Horvitz-Thompson (stratified)", "m_xy_ds" = "Duchon spline bivariate", "m_xy_te" = "Thin plate tensor product", "m_xy_tp" = "Thin plate bivariate", "m_xy_ts" = "Thin plate with shrinkage bivariate", "m_xyr_tp" = "Thin plate bivariate rotated", "m_xyr_te" = "Thin plate tensor product rotated" ) big_res$name <- unlist(model_name_lookup[big_res$model]) big_res$bias <- big_res$N - true_N p <- ggplot(big_res)+ geom_boxplot(aes(name, bias)) + labs(y="Bias", x="Model") + coord_cartesian(ylim=c(-200,300)) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, size=8)) + geom_hline(aes(yintercept=0), colour="red") print(p)
We can also investigate where the true abundance lies within the estimated distribution of abundance from the model. If the distribution of the statistic is skewed to either end then we can infer under or over estimation of abundance for a particular estimator. A flat distribution shows good performance, whereas a "dome" in the middle indicates a conservative estimate in the sense that confidence intervals are slightly too wide.
These are less illustrative in this case, as we only have 10 replicates.
p <- ggplot(big_res)+ geom_histogram(aes(quantile))+ facet_wrap(~name, nrow=2)+ geom_vline(xintercept=0.5) + labs(y="Count", x="Quantile") + theme(strip.text = element_text(size=4)) + theme_minimal() print(p)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.