Meuse Data Example"
In clustTMB: Spatio-Temporal Finite Mixture Model using 'TMB'

knitr::opts_chunk$set(echo = TRUE)
library(kableExtra)

This vignette provides examples of setting up non-spatial and spatial finite mixture models (FMMs) using clustTMB.

Fitting clustTMB models

Data Overview

The meuse data set from the sp library [@Pebesma2005] consists of 155 observations, 4 response variables, location data, and 8 covariates. The response involves four heavy metals measured to monitor pollution levels in the top soil of the Meuse river floodplain, Netherlands (Fig. \@ref(fig:sample), [@Rikken1993]). The heavy metal data exhibit normality on the logscale (Fig. \@ref(fig:pairs-expl)), allowing for a comparison between the Gaussian and lognormal distributions in addition to an FMM with and without spatial structure influencing the cluster probability.

library(GGally)
# refactor from sp to sf when meuse dataset available through sf
library(ggspatial)
library(spData)
library(sf)
library(cowplot)
library(giscoR)
library(wesanderson)
library(magrittr)
library(kableExtra)
library(knitr)
library(ggplot2)
library(inlabru)
library(fmesher)

# load data
library(sp)
data("meuse")
data("meuse.riv")
data("meuse.grid")

# map study area
# read in and project spatial data
m.river <- as.data.frame(meuse.riv[meuse.riv[, 2] > 329000 & meuse.riv[, 2] < 334200, ])
colnames(m.river) <- c("x", "y")
m.river <- sf::st_as_sf(m.river,
  coords = c("x", "y"),
  crs = 28992
) %>%
  dplyr::summarise(geometry = sf::st_combine(geometry)) %>%
  sf::st_cast("POLYGON")

m.grid <- sf::st_as_sf(meuse.grid,
  coords = c("x", "y"),
  crs = 28992
)
m.data <- sf::st_as_sf(meuse,
  coords = c("x", "y"),
  crs = 28992
)

mRiv <- sf::st_transform(m.river, crs = 4326)
mGrid <- sf::st_transform(m.grid, crs = 4326)
mDat <- sf::st_transform(m.data, crs = 4326)

# create map breaks and labels
xbreaks <- seq(5.72, 5.76, .01)
xlabels <- unlist(sapply(xbreaks, function(x) paste(x, "°E")))
ybreaks <- seq(50.95, 51.00, .01)
ylabels <- unlist(sapply(ybreaks, function(x) paste(x, "°N")))
# create reference map
Europe <- giscoR::gisco_get_countries(region = "Europe")
r.bb <- sf::st_bbox(mRiv)
r.bb[1] <- r.bb[1] - 0.5
r.bb[2] <- r.bb[2] - 0.3
r.bb[3] <- r.bb[3] + 0.5
r.bb[4] <- r.bb[4] + 0.3
rbox <- sf::st_as_sfc(r.bb)
ggm1 <- ggplot2::ggplot() +
  ggplot2::geom_sf(data = Europe, fill = "cornsilk") +
  ggplot2::geom_sf(data = rbox, fill = NA, color = "red", size = 1.2) +
  coord_sf(
    xlim = c(-5, 20),
    ylim = c(46.5, 56.5)
  ) +
  theme_void() +
  theme(
    panel.background = element_rect(fill = "#c9f4fe"),
    panel.border = element_rect(
      colour = "black",
      fill = NA,
      linewidth = 1.5
    )
  )

# create site map
ggm2 <- ggplot2::ggplot() +
  ggplot2::geom_sf(data = mDat) +
  theme_classic() +
  geom_sf(data = mRiv, fill = "lightblue") +
  ggspatial::annotation_north_arrow(
    which_north = "grid", height = unit(1, "cm"), width = unit(1, "cm"),
    pad_y = unit(4, "cm")
  )
cowplot::ggdraw() + cowplot::draw_plot(ggm2) + cowplot::draw_plot(ggm1, x = .18, y = 0.7, width = 0.35, height = 0.23)

# Visualize heavy metal data
ggpairs(meuse[, 3:6])

Simple FMM

The simple Gaussian FMM can be fit using the following code:

library(clustTMB)
mod.gauss <- clustTMB(response = meuse[, 3:6], G = 3, covariance.structure = "VVV")

Specifying a lognormal distribution is implemented using the family and link specification:

mod.ln <- clustTMB(
  response = meuse[, 3:6],
  family = lognormal(link = "identity"),
  G = 3, covariance.structure = "VVV"
)

Spatial FMM

The inclusion of spatial random effects in the expectation of the cluster probability in clustTMB depends on the SPDE-FEM approximation to a spatial Gaussian Markov Random Field (GMRF) introduced by the package, R-INLA. See [Spatial GMRF] and [Gating Network] for details on these clustTMB formulations. The fmesher R package (github source code) is used to run functions needed to implement this SPDE-FEM method.

Setting up the spatial mesh

The spatial model in clustTMB first requires the definition of the spatial mesh. This mesh defines the discretization of the continuous space using a constrained Delaunay triangulation. For details on building the spatial mesh, see Krainski et al., 2021, Sec 2.6.

As an example, the spatial mesh for the meuse data is built using the fmesher functions, fm_nonconvex_hull() to set up the boundary and fm_mesh_2d() to generate the spatial mesh:

library(fmesher)
loc <- meuse[, 1:2]
Bnd <- fmesher::fm_nonconvex_hull(as.matrix(loc), convex = 200)
meuse.mesh <- fmesher::fm_mesh_2d(as.matrix(loc),
  max.edge = c(300, 1000),
  boundary = Bnd
)

The inlabru R package can be used to visualize the mesh:

ggplot() +
  inlabru::gg(meuse.mesh) +
  geom_point(mapping = aes(x = loc[, 1], y = loc[, 2], size = 0.5), size = 0.5) +
  theme_classic()

Fitting a spatial model with clustTMB

Coordinates are converted to a spatial point data frame and read into the clustTMB model, along with the mesh, using the spatial.list argument. Spatial projections can be generated by defining a spatial data frame of prediction coordinates. These can be passed into the model via the projection.dat argument of clustTMB. The gating formula is specified using the gmrf() command:

# convert coordinates to a spatial point data frame
Loc <- sf::st_as_sf(loc, coords = c("x", "y"))

# define spatial prediction coordinates
data("meuse.grid")
Meuse.Grid <- sf::st_as_sf(meuse.grid, coords = c("x", "y"))
mod.ln.sp <- clustTMB(
  response = meuse[, 3:6],
  family = lognormal(link = "identity"),
  gatingformula = ~ gmrf(0 + 1 | loc),
  G = 4, covariance.structure = "VVV",
  spatial.list = list(loc = Loc, mesh = meuse.mesh),
  projection.dat = Meuse.Grid
)

Results can be viewed via model output:

# Estimated fixed parameters
summary(mod.ln.sp$sdr, "fixed")
# Minimum negative log likelihood
mod.ln.sp$opt$objective

Comparison Case Study

A cluster analysis was run on the meuse dataset using the Gaussian and lognormal family with and without spatial random effects in the gating network for clusters ranging from 2 to 10. BIC scores favored the lognormal spatial model with 4 clusters (Table \@ref(tab:BIC)). This model resulted in fewer clusters compared to the spatial Gaussian model.

df <- data.frame(
  family = c(rep("lognormal", 2), rep("Gaussian", 2)),
  space = c(1, 0, 1, 0),
  clusters = c(4, 4, 6, 3),
  BIC = c(4805, 4861, 4861, 4910)
)
kable_styling(
  kable(df,
    booktabs = TRUE,
    caption = "Optimum cluster size and BIC scores for lognormal and Gaussian models with (1) and without (0) spatial random effects in the gating network."
  ),
  full_width = TRUE
)

Results from the optimal model suggested a spatial pattern where the highest ppm observations for all four metals were clustered together (Cluster 0) in a narrow strip along the bank of the Meuse River within the northwest corner of the floodplain (Fig. \@ref(fig:pairs), Fig. \@ref(fig:pred)). A separate cluster (Cluster 3) was characterized by low ppm values for all metals and was spatially distributed in the central region of the floodplain. The other two clusters were characterized by moderately low (Cluster 2) and moderately high (Cluster 1) ppm values for all metals. Spatial predictions of clustered heavy metals in the Meuse River floodplain (Fig. \@ref(fig:pred)) can aid in risk assessment and environmental mitigation measures after flood events.

# Visualize heavy metal data
Cluster <- factor(mod.ln.sp$report$classification)
ggpairs(meuse[, 3:6], aes(color = Cluster, alpha = 90.5)) +
  scale_color_manual(values = wes_palette("Moonrise2", 4)) +
  scale_fill_manual(values = wes_palette("Moonrise2", 4))

Cluster <- factor(mod.ln.sp$report$Class_pred)

# create site map
ggm2 <- ggplot2::ggplot(mGrid) +
  ggplot2::stat_sf_coordinates(aes(color = Cluster), geom = "point") +
  theme_classic() +
  scale_color_manual(values = wes_palette("Moonrise2", 4)) +
  geom_sf(data = mRiv, fill = "lightblue") +
  ggspatial::annotation_north_arrow(
    which_north = "grid", height = unit(1, "cm"), width = unit(1, "cm"),
    pad_y = unit(4, "cm")
  )
cowplot::ggdraw() + cowplot::draw_plot(ggm2) + cowplot::draw_plot(ggm1, x = .18, y = 0.7, width = 0.35, height = 0.23)

clustTMB Formulations

Spatial GMRF

clustTMB fits spatial random effects using a Gaussian Markov Random Field (GMRF). The precision matrix, $Q$, of the GMRF is the inverse of a Matern covariance function and takes two parameters: 1) $\kappa$, which is the spatial decay parameter and a scaled function of the spatial range, $\phi = \sqrt{8}/\kappa$, the distance at which two locations are considered independent; and 2) $\tau$, which is a function of $\kappa$ and the marginal spatial variance $\sigma^{2}$:

$$\tau = \frac{1}{2\sqrt{\pi}\kappa\sigma}.$$ The precision matrix is approximated following the SPDE-FEM approach, where a constrained Delaunay triangulation network is used to discretize the spatial extent in order to determine a GMRF for a set of irregularly spaced locations, $i$. For details on the SPDE-FEM approach, see Krainski et al., 2021, Sec. 2.2

$$\omega_{i} \sim GMRF(Q[\kappa, \tau])$$

Gating Network

When random effects, $\mathbb{u}$, are specified in the gating network, the probability of cluster membership $\pi_{i,g}$ for observation $i$ is fit using multinomial regression:

$$ \begin{align} \mathbb{\eta}{,g} &= X\mathbb{\beta}{,g} + \mathbb{u}{,g} \ \mathbb{\pi}{,g} &= \frac{ exp(\mathbb{\eta}{,g})}{\sum^{G}{g=1}exp(\mathbb{\eta}_{,g})} \end{align} $$

References {-}

Any scripts or data that you put into this service are public.

clustTMB documentation built on Oct. 14, 2024, 5:09 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

clustTMB
Spatio-Temporal Finite Mixture Model using 'TMB'

Meuse Data Example"
In clustTMB: Spatio-Temporal Finite Mixture Model using 'TMB'

Fitting clustTMB models

Data Overview

Simple FMM

Spatial FMM

Setting up the spatial mesh

Fitting a spatial model with clustTMB

Comparison Case Study

clustTMB Formulations

Spatial GMRF

Gating Network

References {-}

Try the clustTMB package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

clustTMB Spatio-Temporal Finite Mixture Model using 'TMB'

Meuse Data Example" In clustTMB: Spatio-Temporal Finite Mixture Model using 'TMB'

Fitting clustTMB models

Data Overview

Simple FMM

Spatial FMM

Setting up the spatial mesh

Fitting a spatial model with clustTMB

Comparison Case Study

clustTMB Formulations

Spatial GMRF

Gating Network

References {-}

Try the clustTMB package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

clustTMB
Spatio-Temporal Finite Mixture Model using 'TMB'

Meuse Data Example"
In clustTMB: Spatio-Temporal Finite Mixture Model using 'TMB'