sfclust: Bayesian spatial functional clustering

View source: R/model-cluster.R

sfclustR Documentation

Bayesian spatial functional clustering

Description

Bayesian detection of neighboring spatial regions with similar functional shapes using spanning trees and latent Gaussian models. It ensures spatial contiguity in the clusters, handles a large family of latent Gaussian models supported by inla, and allows to work with non-Gaussian likelihoods.

Usage

sfclust(
  stdata,
  graphdata = NULL,
  stnames = c("geometry", "time"),
  move_prob = c(0.425, 0.425, 0.1, 0.05),
  q = 0.5,
  correction = TRUE,
  niter = 100,
  burnin = 0,
  thin = 1,
  nmessage = 10,
  path_save = NULL,
  nsave = nmessage,
  ...
)

Arguments

stdata

A stars object containing response variables, covariates, and other necessary data.

graphdata

A list containing the initial graph used for the Bayesian model. It should include components like graph, mst, and membership (default is NULL).

stnames

A character vector specifying the spatio-temporal dimension names of stdata that represent spatial geometry and time, respectively (default is c("geometry", "time")).

move_prob

A numeric vector of probabilities for different types of moves in the MCMC process: birth, death, change, and hyperparameter moves (default is c(0.425, 0.425, 0.1, 0.05)).

q

A numeric value representing the penalty for the number of clusters (default is 0.5).

correction

A logical indicating whether correction to compute the marginal likelihoods should be applied (default is TRUE). This depend of the type of effect inclused in the INLA model.

niter

An integer specifying the number of MCMC iterations to perform (default is 100).

burnin

An integer specifying the number of burn-in iterations to discard (default is 0).

thin

An integer specifying the thinning interval for recording the results (default is 1).

nmessage

An integer specifying how often progress messages should be printed (default is 10).

path_save

A character string specifying the file path to save the results (default is NULL).

nsave

An integer specifying the number of iterations between saved results in the chain (default is nmessage).

...

Additional arguments such as formula, family, and others that are passed to the inla function.

Details

This implementation draws inspiration from the methods described in the paper: "Bayesian Clustering of Spatial Functional Data with Application to a Human Mobility Study During COVID-19" by Bohai Zhang, Huiyan Sang, Zhao Tang Luo, and Hui Huang, published in The Annals of Applied Statistics, 2023. For further details on the methodology, please refer to:

  • The paper: \Sexpr[results=rd]{tools:::Rd_expr_doi("doi:10.1214/22-AOAS1643")}

  • Supplementary material: \Sexpr[results=rd]{tools:::Rd_expr_doi("doi:10.1214/22-AOAS1643SUPPB")}

The MCMC algorithm in this implementation is largely based on the supplementary material provided in the paper. However, we have generalized the computation of the marginal likelihood ratio by leveraging INLA (Integrated Nested Laplace Approximation). This generalization enables integration over all parameters and hyperparameters, allowing for inference within a broader family of distribution functions and model terms, thereby extending the scope and flexibility of the original approach. Further details of our approach can be found in our paper "Bayesian spatial functional data clustering: applications in disease surveillance" by Ruiman Zhong, Erick A. Chacón-Montalván, Paula Moraga:

Value

An sfclust object containing two main lists: samples and clust.

  • The samples list includes details from the sampling process, such as:

    • membership: The cluster membership assignments for each sample.

    • log_marginal_likelihood: The log marginal likelihood for each sample.

    • move_counts: The counts of each type of move during the MCMC process.

  • The clust list contains information about the selected clustering, including:

    • id: The identifier of the selected sample (default is the last sample).

    • membership: The cluster assignments for the selected sample.

    • models: The fitted models for each cluster in the selected sample.

Author(s)

Ruiman Zhong ruiman.zhong@kaust.edu.sa, Erick A. Chacón-Montalván erick.chaconmontalvan@kaust.edu.sa, Paula Moraga paula.moraga@kaust.edu.sa

Examples



library(sfclust)

# Clustering with Gaussian data
data(stgaus)
result <- sfclust(stgaus, formula = y ~ f(idt, model = "rw1"),
  niter = 10, nmessage = 1)
print(result)
summary(result)
plot(result)

# Clustering with binomial data
data(stbinom)
result <- sfclust(stbinom, formula = cases ~ poly(time, 2) + f(id),
  family = "binomial", Ntrials = population, niter = 10, nmessage = 1)
print(result)
summary(result)
plot(result)



sfclust documentation built on June 8, 2025, 10:11 a.m.