In meixichen/SpatialGEV: Fit Spatial Generalized Extreme Value Models

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  tidy = "styler"
)

SpatialGEV

Meixi Chen, Martin Lysy, Reza Ramezan

Description

A fast Bayesian inference method for spatial random effects modelling of weather extremes. The latent spatial variables are efficiently marginalized by a Laplace approximation using the TMB library, which leverages efficient automatic differentiation in C++. The models are compiled in C++, whereas the optimization step is carried out in R. With this package, users can fit spatial GEV models with different complexities to their dataset without having to formulate the model using C++. This package also offers a method to sample from the approximate posterior distributions of both fixed and random effects, which will be useful for downstream analysis.

Installation

Before installing SpatialGEV, make sure you have TMB installed following the instructions here.

SpatialGEV uses several functions from the INLA package for SPDE approximation to the Matérn covariance as well as mesh creation on the spatial domain. If the user would like to use the SPDE method (i.e. kernel="spde" in spatialGEV_fit()), please first install package INLA. Since INLA is not on CRAN, it needs to be downloaded following their instruction here.

To download the stable version of this package, run

install.packages("SpatialGEV")

To download the development version of this package, run

devtools::install_github("meixichen/SpatialGEV")

Example

Using the simulated data set simulatedData2 provided in the package, we demonstrate how to use this package. Spatial variation of the GEV parameters are plotted below.

library(SpatialGEV)
# GEV parameters simulated from Gaussian random fields
a <- simulatedData2$a # location
logb <- simulatedData2$logb # log scale
logs <- simulatedData2$logs # log shape
locs <- simulatedData2$locs # coordinate matrix
n_loc <- nrow(locs) # number of locations
y <- Map(evd::rgev, n=sample(50:70, n_loc, replace=TRUE),
         loc=a, scale=exp(logb), shape=exp(logs)) # observations

filled.contour(
  x = unique(locs$x),
  y = unique(locs$y),
  z = matrix(a, ncol=sqrt(n_loc)), 
  color.palette = terrain.colors,
  xlab="Longitude", ylab="Latitude", 
  main="Spatial variation of a",
  cex.lab=1,cex.axis=1
)

filled.contour(
  x = unique(locs$x),
  y = unique(locs$y),
  z = matrix(exp(logb), ncol=sqrt(n_loc)), 
  color.palette = terrain.colors,
  xlab="Longitude", ylab="Latitude", 
  main="Spatial variation of b",
  cex.lab=1,cex.axis=1
)

filled.contour(
  x = unique(locs$x),
  y = unique(locs$y),
  z = matrix(exp(logs), ncol=sqrt(n_loc)),
  color.palette = terrain.colors,
  xlab="Longitude", ylab="Latitude",
  main="Spatial variation of s",
  cex.lab=1,cex.axis=1
)

To fit a GEV-GP model to the simulated data, use the spatialGEV_fit() function. We use random="abs" to indicate that all three GEV parameters are treated as random effects. The shape parameter s is constrained to be positive (log transformed) by specifying reparam_s="positive". The covariance kernel function used here is the SPDE-approximated Matérn kernel kernel="spde". Initial parameter values are passed to init_param using a list.

fit <- spatialGEV_fit(
  data = y, locs = locs, random = "abs",
  init_param = list(
    a = rep(60, n_loc),
    log_b = rep(2,n_loc),
    s = rep(-3,n_loc),
    beta_a = 60, beta_b = 2, beta_s = -2,
    log_sigma_a = 1.5, log_kappa_a = -2,
    log_sigma_b = 1.5, log_kappa_b = -2,
    log_sigma_s = -1, log_kappa_s = -2
  ),
  reparam_s = "positive",
  kernel="spde",
  silent = TRUE
)

class(fit)
print(fit)

Posterior samples of the random and fixed effects are drawn using spatialGEV_sample(). Specify observation=TRUE if we would also like to draw from the posterior predictive distribution.

sam <- spatialGEV_sample(model = fit, n_draw = 1e4, observation = TRUE)
print(sam)

To get summary statistics of the posterior samples, use summary() on the sample object.

pos_summary <- summary(sam)
pos_summary$param_summary[1:5,]
pos_summary$y_summary[1:5,]

One can also plot the full posteriors using e.g., the bayespolot package.

library(bayesplot)
library(ggplot2)
mcmc_areas(
  x = sam$parameter_draws[,1:5],
  prob = .95,
  point_est = "mean"
) +
  ggtitle(
    "Posterior distributions of a1 - a5",
    "with posterior means and 95% credible intervals"
  )

TODO

[ ] Consider a shorter name, e.g., sgev_*, than spatialGEV_*.
[ ] Argument init_params adds a lot of complexity to spatialGEV_fit(). Perhaps this function can be broken down into two parts: spatialGEV_adfun() which returns the adfun object, and then spatialGEV_fit() which does the fitting. The advantage is that the adfun$env$parameters object tells you the dimension of the parameter values, which can be useful for initialization. Also, note that adfun$env$data object contains the entire data list for browsing.
[ ] Write some tests for the spde kernel. Construct the sparse precision matrix using get_spde_prec(), invert it using Matrix::solve(), apply the scale factor, and pass as variance to dmvnorm().