fusionData: Prepare data structure for spatial fusion modelling

View source: R/fusionData.R

fusionDataR Documentation

Prepare data structure for spatial fusion modelling

Description

Takes various datasets and formulas from different spatial data types and process them to prepare for spatial fusion modeling using either Stan or INLA.

Usage

fusionData(geo.data, geo.formula,
           lattice.data, lattice.formula,
           pp.data, distributions, domain = NULL,
           method = c("Stan", "INLA"),
           proj4string = CRS(as.character(NA)),
           stan.control = NULL)

Arguments

geo.data

an object of class data.frame or SpatialPointsDataFrame. If data.frame, it must have column names "x" and "y" as coordinates of observations.

geo.formula

an object of class formula. A symbolic description of the model to be fitted for geostatistical data. For multivariate geostatistical data, use syntax cbind(y1, y2) followed by ~.

lattice.data

an object of class SpatialPolygonsDataFrame. Contains lattice data.

lattice.formula

an object of class formula. A symbolic description of the model to be fitted for lattice data. For multivariate lattice data, use syntax cbind(y1, y2) followed by ~.

pp.data

an object of class data.frame, SpatialPoints or SpatialPointsDataFrame, or a list of them. If data.frame, it must have column names "x" and "y" as coordinates.

distributions

a vector of strings. Specifying the distributions of each geostatistical and lattice response variable, currently “Gaussian” or “normal”, “Poisson” (count) and “Bernoulli” (binary) are supported. Note: no distribution is required to be specified for point pattern data.

domain

an object of class SpatialPolygons. The spatial domain considered for computing gridded point pattern data. If NULL, a bounding box that contains all spatial units is used.

method

character. Either 'Stan' or 'INLA', the method to be used for fitting the spatial fusion model later.

proj4string

projection string of class CRS-class.

stan.control

a named list of parameters to control the Stan implementation of spatial fusion models. Default to NULL such that all the default values are used.

  • n.neighbor (positive integer) Number of nearest neighbors to consider. Default to 5.

  • n.sampling (positive integer) Number of sampling points for each area. Default to 5.

  • n.grid (positive integer) Number of grid used to divide the spatial domain in each of x- and y-direction to count the number of cases/events in each grid. Default to 10.

Details

It is not possible to add covariate for point pattern data in the spatial fusion framework. However, an offset term can be supplied to pp.offset in the modelling stage with fusion. Any covariate information can be taken into account by firstly fit a fixed effect model and enter the fitted values into the offset term.

Value

The returned value is an object of either class dstan or dinla, depending on the chosen method. They are both lists that contain:

distributions

distribution specified each response variable.

n_point

sample size for geostatistical data.

n_area

sample size for lattice data.

n_grid

Set to 1 for INLA, set to the number of grids for Stan.

p_point

number of coefficients for geostatistical model component (only if there is geostatistical data).

n_point_var, n_area_var, n_pp_var

number of response variables for each data type.

Y_point

response variable for geostatistical data (only if there is geostatistical data).

X_point

covariates for geostatistical data (only if there is geostatistical data).

p_area

number of coefficients for lattice model component (only if there is lattice data).

Y_area

response variable for lattice data (only if there is lattice data).

X_area

covariates for lattice data (only if there is lattice data).

geo.formula, lattice.formula

formulas used for geostatistical and lattice data.

dstan additionally contains:

n_neighbor

number of nearest neighbors to consider for NNGP modelling.

n_sample

total number of sampling points.

nearid, nearind_sample

vectors containing neighborhood indices

C_nei, C_site_nei, sC_nei, sC_site_nei

various distance matrices

A1

aggregation matrix that maps sampling points to areal averages (only if there is lattice data).

Y_pp

the number of cases/events in each grid for point pattern data (only if there is point pattern data).

area

the area of each grid (only if there is point pattern data).

grd_lrg

the grid generated for point pattern data modeling (only if there is point pattern data).

locs

all the locations where the latent components are modelled.

dinla additionally contains:

domain

spatial domain as a SpatialPolygons-class

locs_point

locations of geostatistical data.

locs_pp

locations of point pattern data.

poly

lattice data as a SpatialPolygonsDataFrame-class.

Author(s)

Craig Wang

See Also

fusion.dinla, fusion.dstan

Examples

## example based on simulated built-in data

dat <- fusionData(dataGeo, lungfunction ~ covariate,
           dataLattice, mortality ~ covariate,
           dataPP, distribution = c("normal","poisson"),
           domain = dataDomain,
           method = "INLA")
## Not run: 
if (require("INLA", quietly = TRUE)) {
## fit a spatial fusion model on the prepared data
## pp.offset = 400 was chosen based on simulation parameters
mod <- fusion(data = dat, n.latent = 1, bans = 0, pp.offset = 400,
           prior.range = c(0.1, 0.5), prior.sigma = c(1, 0.5),
           mesh.locs = dat$locs_point, mesh.max.edge = c(0.5, 1))

## parameter estimates
summary(mod)
}

## End(Not run)

spatialfusion documentation built on Aug. 23, 2022, 1:05 a.m.