fusionData: Prepare data structure for spatial fusion modelling
In spatialfusion: Multivariate Analysis of Spatial Data Using a Unifying Spatial Fusion Framework

fusionData

R Documentation

Prepare data structure for spatial fusion modelling

Description

Takes various datasets and formulas from different spatial data types and process them to prepare for spatial fusion modeling using either Stan or INLA.

Usage

fusionData(geo.data, geo.formula,
           lattice.data, lattice.formula,
           pp.data, distributions, domain = NULL,
           method = c("Stan", "INLA"),
           proj4string = CRS(as.character(NA)),
           stan.control = NULL)

Arguments

`geo.data`	an object of class `data.frame` or `sf`. If `data.frame`, it must have column names "x" and "y" as coordinates of observations.
`geo.formula`	an object of class `formula`. A symbolic description of the model to be fitted for geostatistical data. For multivariate geostatistical data, use syntax `cbind(y1, y2)` followed by `~`.
`lattice.data`	an object of class `sf`. Contains lattice data.
`lattice.formula`	an object of class `formula`. A symbolic description of the model to be fitted for lattice data. For multivariate lattice data, use syntax `cbind(y1, y2)` followed by `~`.
`pp.data`	an object of class `data.frame` or `sf`, or a list of them. If `data.frame`, it must have column names "x" and "y" as coordinates.
`distributions`	a vector of strings. Specifying the distributions of each geostatistical and lattice response variable, currently “Gaussian” or “normal”, “Poisson” (count) and “Bernoulli” (binary) are supported. Note: no distribution is required to be specified for point pattern data.
`domain`	an object of class `sf`. The spatial domain considered for computing gridded point pattern data. If `NULL`, a bounding box that contains all spatial units is used.
`method`	character. Either 'Stan' or 'INLA', the method to be used for fitting the spatial fusion model later.
`proj4string`	projection string of class `CRS-class`.
`stan.control`	a named list of parameters to control the Stan implementation of spatial fusion models. Default to NULL such that all the default values are used. `n.neighbor` (positive integer) Number of nearest neighbors to consider. Default to 5. `n.sampling` (positive integer) Number of sampling points for each area. Default to 5. `n.grid` (positive integer) Number of grid used to divide the spatial domain in each of x- and y-direction to count the number of cases/events in each grid. Default to 10.

Details

It is not possible to add covariate for point pattern data in the spatial fusion framework. However, an offset term can be supplied to pp.offset in the modelling stage with fusion. Any covariate information can be taken into account by firstly fit a fixed effect model and enter the fitted values into the offset term.

Value

The returned value is an object of either class dstan or dinla, depending on the chosen method. They are both lists that contain:

`distributions`	distribution specified each response variable.
`n_point`	sample size for geostatistical data.
`n_area`	sample size for lattice data.
`n_grid`	Set to 1 for INLA, set to the number of grids for Stan.
`p_point`	number of coefficients for geostatistical model component (only if there is geostatistical data).
`n_point_var`, `n_area_var`, `n_pp_var`	number of response variables for each data type.
`Y_point`	response variable for geostatistical data (only if there is geostatistical data).
`X_point`	covariates for geostatistical data (only if there is geostatistical data).
`p_area`	number of coefficients for lattice model component (only if there is lattice data).
`Y_area`	response variable for lattice data (only if there is lattice data).
`X_area`	covariates for lattice data (only if there is lattice data).
`geo.formula`, `lattice.formula`	formulas used for geostatistical and lattice data.

dstan additionally contains:

`n_neighbor`	number of nearest neighbors to consider for NNGP modelling.
`n_sample`	total number of sampling points.
`nearid`, `nearind_sample`	vectors containing neighborhood indices
`C_nei`, `C_site_nei`, `sC_nei`, `sC_site_nei`	various distance matrices
`A1`	aggregation matrix that maps sampling points to areal averages (only if there is lattice data).
`Y_pp`	the number of cases/events in each grid for point pattern data (only if there is point pattern data).
`area`	the area of each grid (only if there is point pattern data).
`grd_lrg`	the grid generated for point pattern data modeling (only if there is point pattern data).
`locs`	all the locations where the latent components are modelled.

dinla additionally contains:

`domain`	spatial domain as a SpatialPolygons-class
`locs_point`	locations of geostatistical data.
`locs_pp`	locations of point pattern data.
`poly`	lattice data as a SpatialPolygonsDataFrame-class.

Author(s)

Craig Wang

Examples

## example based on simulated built-in data

dat <- fusionData(dataGeo, lungfunction ~ covariate,
           dataLattice, mortality ~ covariate,
           dataPP, distribution = c("normal","poisson"),
           domain = dataDomain,
           method = "INLA")


if (requireNamespace("INLA", quietly = TRUE)) {
## fit a spatial fusion model on the prepared data
## pp.offset = 400 was chosen based on simulation parameters
mod <- fusion(data = dat, n.latent = 1, bans = 0, pp.offset = 400,
           prior.range = c(0.1, 0.5), prior.sigma = c(1, 0.5),
           mesh.locs = dat$locs_point, mesh.max.edge = c(0.5, 1))

## parameter estimates
summary(mod)
}

spatialfusion documentation built on June 22, 2025, 5:07 p.m.