sfclust: Bayesian spatial functional clustering
In sfclust: Bayesian Spatial Functional Clustering

sfclust

R Documentation

Bayesian spatial functional clustering

Description

Bayesian detection of neighboring spatial regions with similar functional shapes using spanning trees and latent Gaussian models. It ensures spatial contiguity in the clusters, handles a large family of latent Gaussian models supported by inla, and allows to work with non-Gaussian likelihoods.

Usage

sfclust(
  stdata,
  graphdata = NULL,
  stnames = c("geometry", "time"),
  move_prob = c(0.425, 0.425, 0.1, 0.05),
  q = 0.5,
  correction = TRUE,
  niter = 100,
  burnin = 0,
  thin = 1,
  nmessage = 10,
  path_save = NULL,
  nsave = nmessage,
  ...
)

Arguments

`stdata`	A stars object containing response variables, covariates, and other necessary data.
`graphdata`	A list containing the initial graph used for the Bayesian model. It should include components like `graph`, `mst`, and `membership` (default is `NULL`).
`stnames`	A character vector specifying the spatio-temporal dimension names of `stdata` that represent spatial geometry and time, respectively (default is `c("geometry", "time")`).
`move_prob`	A numeric vector of probabilities for different types of moves in the MCMC process: birth, death, change, and hyperparameter moves (default is `c(0.425, 0.425, 0.1, 0.05)`).
`q`	A numeric value representing the penalty for the number of clusters (default is `0.5`).
`correction`	A logical indicating whether correction to compute the marginal likelihoods should be applied (default is `TRUE`). This depend of the type of effect inclused in the `INLA` model.
`niter`	An integer specifying the number of MCMC iterations to perform (default is `100`).
`burnin`	An integer specifying the number of burn-in iterations to discard (default is `0`).
`thin`	An integer specifying the thinning interval for recording the results (default is `1`).
`nmessage`	An integer specifying how often progress messages should be printed (default is `10`).
`path_save`	A character string specifying the file path to save the results (default is `NULL`).
`nsave`	An integer specifying the number of iterations between saved results in the chain (default is `nmessage`).
`...`	Additional arguments such as `formula`, `family`, and others that are passed to the `inla` function.

Details

This implementation draws inspiration from the methods described in the paper: "Bayesian Clustering of Spatial Functional Data with Application to a Human Mobility Study During COVID-19" by Bohai Zhang, Huiyan Sang, Zhao Tang Luo, and Hui Huang, published in The Annals of Applied Statistics, 2023. For further details on the methodology, please refer to:

The paper: \Sexpr[results=rd]{tools:::Rd_expr_doi("doi:10.1214/22-AOAS1643")}
Supplementary material: \Sexpr[results=rd]{tools:::Rd_expr_doi("doi:10.1214/22-AOAS1643SUPPB")}

The MCMC algorithm in this implementation is largely based on the supplementary material provided in the paper. However, we have generalized the computation of the marginal likelihood ratio by leveraging INLA (Integrated Nested Laplace Approximation). This generalization enables integration over all parameters and hyperparameters, allowing for inference within a broader family of distribution functions and model terms, thereby extending the scope and flexibility of the original approach. Further details of our approach can be found in our paper "Bayesian spatial functional data clustering: applications in disease surveillance" by Ruiman Zhong, Erick A. Chacón-Montalván, Paula Moraga:

The paper: https://arxiv.org/abs/2407.12633

Value

An sfclust object containing two main lists: samples and clust.

The samples list includes details from the sampling process, such as:
- membership: The cluster membership assignments for each sample.
- log_marginal_likelihood: The log marginal likelihood for each sample.
- move_counts: The counts of each type of move during the MCMC process.
The clust list contains information about the selected clustering, including:
- id: The identifier of the selected sample (default is the last sample).
- membership: The cluster assignments for the selected sample.
- models: The fitted models for each cluster in the selected sample.

Author(s)

Ruiman Zhong ruiman.zhong@kaust.edu.sa, Erick A. Chacón-Montalván erick.chaconmontalvan@kaust.edu.sa, Paula Moraga paula.moraga@kaust.edu.sa

Examples



library(sfclust)

# Clustering with Gaussian data
data(stgaus)
result <- sfclust(stgaus, formula = y ~ f(idt, model = "rw1"),
  niter = 10, nmessage = 1)
print(result)
summary(result)
plot(result)

# Clustering with binomial data
data(stbinom)
result <- sfclust(stbinom, formula = cases ~ poly(time, 2) + f(id),
  family = "binomial", Ntrials = population, niter = 10, nmessage = 1)
print(result)
summary(result)
plot(result)

sfclust documentation built on June 8, 2025, 10:11 a.m.