dplyr_installed <- require("dplyr", quietly = TRUE) ggplot_installed <- require("ggplot2", quietly = TRUE) patstat.data_installed <- require("spatstat.data", quietly = TRUE) pkgs <- dplyr_installed && ggplot_installed && spatstat.data_installed knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.asp = 0.618, eval = identical(Sys.getenv("NOT_CRAN"), "true") && pkgs )
library(dplyr) library(sdmTMB) library(ggplot2) library(spatstat.data)
By adding pseudo-absences to presence-only data, we can estimate a spatial range that isn't sensitive to choice of raster or lattice resolution.
This example of modeling presence-absence will use data on the locations of 3605 trees (species Beilschmiedia pendula) in a tropical rainforest from the spatst.data package
These data come from the spatstat
package, and consist of point locations of 3605 trees on Barro Colorado Island (Panama). From spatstat
, the data files kindly supplied by Rasmus Waagepetersen
The data are stored in the bei
dataframe,
dat <- data.frame( x = spatstat.data::bei$x, y = spatstat.data::bei$y )
The distribution of trees is obviously not uniform and the middle of the map has very few trees.
ggplot(dat, aes(x, y)) + geom_point(col = "darkblue", alpha = 0.1) + coord_cartesian(expand = FALSE)
There are several approaches to generating pseduo-absences from these data. We first use a method called quadrature points (Renner et al. 2015). For this method, we have to decide:
1. Distribution of zeroes
- Regularly spaced (e.g. on a grid or lattice?)
- Random
- Higher density where environmental variability is high (suggestion by Renner et al. 2015)
2. How many zeroes to generate?
- Large enough so that predictive performance does not change as more are added
Here, we will use a uniform grid strategy to create ~5000 points. To test whether the number of pseudo-absences is sufficient, res
can be decreased, and model performance can be compared. For instance, to increase the number of pseudo-absences to ~20,000 points, we could change res <- 5
. In this example, increasing the number of zeroes to ~20,000 only marginally improves model performance, so 5,000 zeroes is sufficient.
res <- 10 #Determines resolution: lower value will increase number of zeroes generated # zeros is generated on a grid for this example, but other strategies could be used zeros <- expand.grid( x = seq(0, 1000, by = res), y = seq(0, 500, by = res) )
Then, we combine the presence and pseudo-absence data, and use $present
to separate occurrences from pseudo-absences.
dat$present <- 1 zeros$present <- 0 all_dat <- rbind(dat, zeros)
Next we can create the mesh. The resolution of mesh can be changed with the cutoff
value, which determines the minimum distance between locations in X-Y units. Increasing the cutoff will decrease the resolution of the mesh. In this example, a higher resolution mesh of cutoff=15
marginally improved model performance over cutoff=25
.
mesh <- make_mesh( all_dat, xy_cols = c("x", "y"), cutoff = 15 # min. distance between knots in X-Y units )
The dimensions of the mesh (or number of vertices or knots) can be accessed with mesh$mesh$n
. A cutoff distance of 15 yields ~ 1399 knots, and a cutoff distance of 25 yields a mesh with 624 knots.
Blue dots are data, red grid dots are quadrature points, and grey triangles are from the SPDE mesh.
all_dat$fpres <- as.factor(all_dat$present) ggplot() + inlabru::gg(mesh$mesh) + geom_point( data = all_dat, aes(x = x, y = y, col = fpres), size = 0.1, alpha = 0.3 ) + coord_equal() + # guides(colour = guide_legend(override.aes = list(alpha = 1))) + guides(col = guide_legend(title = "Present"))
Several approaches exist for estimating a model with pseudo-absences. These are similar, but have different likelihood models. First, we can use an Infinitely Weighted Logistic Regression Fithian & Hastie (2013) for the model.
The first step is to calculate weights. Weights will be 1 for true occurrences and a very large number (nW
) for pseudo-absences.
nW <- 1.0e6 all_dat$wt <- nW^(1 - all_dat$present)
And use sdmTMB to fit the model.
fit <- sdmTMB( present ~ 1, data = all_dat, mesh = mesh, family = binomial(link = "logit"), weights = all_dat$wt )
We can inspect the model output.
summary(fit)
A criticism of this approach is that the intercept and log-likelihood can be affected by nW
(Renner et al. 2015)
Another option is the Downweighted Poisson Regression, which is similar to IWLR but uses different weights and doesn't have the same arbitrary effects on intercept and likelihood
First we re-calculate weights (note they're differnt than IWLR):
# small values at presence locations all_dat$wt <- 1e-6 # pseudo-absences: area per quadrature point tot_area <- diff(range(dat$x)) * diff(range(dat$y)) n_zeros <- length(which(all_dat$present == 0)) all_dat$wt <- ifelse(all_dat$present == 1, 1e-6, tot_area / n_zeros )
Then fit the model with the new weights and a Poisson distribution
fit <- sdmTMB( present / wt ~ 1, data = all_dat, mesh = mesh, family = poisson(link = "log"), weights = all_dat$wt )
And inspect the output
summary(fit)
We can plot the random spatial effects
p <- predict(fit, newdata = zeros) ggplot(p, aes(x, y, fill = omega_s)) + geom_raster() + scale_fill_gradient2() + coord_fixed(expand = FALSE)
We can predict spatial distribution both in link (log) space:
ggplot(p, aes(x, y, fill = est)) + geom_raster() + scale_fill_viridis_c() + coord_fixed(expand = FALSE)
Or natural space:
ggplot(p, aes(x, y, fill = exp(est))) + geom_raster() + labs(fill = "Intensity\n(average point density)") + scale_fill_viridis_c(trans = "sqrt") + coord_fixed(expand = FALSE)
To evaluate the predictive performance, there are multiple options for binary data. AUC (Area Under the receiver operating characteristic curve) is a common metric, where values near 0.5 are essentially random, and values close to 1 indicate better predictive performance. We can use the package ROCR.
all_dat$p <- predict(fit)$est # total predictions, logit rocr <- ROCR::prediction(exp(all_dat$p), all_dat$present) ROCR::performance(rocr, measure = "auc")@y.values[[1]]
AUC <- ROCR::performance(rocr, measure = "auc")@y.values[[1]]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.