MCAR_INLA: Fit a (scalable) spatial multivariate Poisson mixed model to...

View source: R/MCAR_INLA.R

MCAR_INLAR Documentation

Fit a (scalable) spatial multivariate Poisson mixed model to areal count data where dependence between spatial patterns of the diseases is addressed through the use of M-models \insertCitebotella2015unifyingbigDM.

Description

Fit a spatial multivariate Poisson mixed model to areal count data. The linear predictor is modelled as

\log{r_{ij}}=\alpha_j + \theta_{ij}, \quad \mbox{for} \quad i=1,\ldots,n; \quad j=1,\ldots,J

where \alpha_j is a disease-specific intercept and \theta_{ij} is the spatial main effect of area i for the j-th disease. Following \insertCitebotella2015unifying;textualbigDM, we rearrange the spatial effects into the matrix \mathbf{\Theta} = \{ \theta_{ij}: i=1, \ldots, I; j=1, \ldots, J \} whose columns are spatial random effects and its joint distribution specifies how dependence within-diseases and between-diseases is defined. Several conditional autoregressive (CAR) prior distributions can be specified to deal with spatial dependence within-diseases, such as the intrinsic CAR prior \insertCitebesag1991bigDM, the CAR prior proposed by \insertCiteleroux1999estimation;textualbigDM, and the proper CAR prior distribution.

As in the CAR_INLA function, three main modelling approaches can be considered:

  • the usual model with a global spatial random effect whose dependence structure is based on the whole neighbourhood graph of the areal units (model="global" argument)

  • a Disjoint model based on a partition of the whole spatial domain where independent spatial CAR models are simultaneously fitted in each partition (model="partition" and k=0 arguments)

  • a modelling approach where k-order neighbours are added to each partition to avoid border effects in the Disjoint model (model="partition" and k>0 arguments).

For both the Disjoint and k-order neighbour models, parallel or distributed computation strategies can be performed to speed up computations by using the 'future' package \insertCitebengtsson2020unifyingbigDM.

Inference is conducted in a fully Bayesian setting using the integrated nested Laplace approximation (INLA; \insertCiterue2009approximate;textualbigDM) technique through the R-INLA package (https://www.r-inla.org/). For the scalable model proposals \insertCiteorozco2020bigDM, approximate values of the Deviance Information Criterion (DIC) and Watanabe-Akaike Information Criterion (WAIC) can also be computed.

The function allows also to use the new hybrid approximate method that combines the Laplace method with a low-rank Variational Bayes correction to the posterior mean \insertCitevanNiekerk2023bigDM by including the inla.mode="compact" argument.

Usage

MCAR_INLA(
  carto = NULL,
  data = NULL,
  ID.area = NULL,
  ID.disease = NULL,
  ID.group = NULL,
  O = NULL,
  E = NULL,
  W = NULL,
  prior = "intrinsic",
  model = "partition",
  k = 0,
  strategy = "simplified.laplace",
  merge.strategy = "original",
  compute.intercept = NULL,
  compute.DIC = TRUE,
  n.sample = 1000,
  compute.fitted.values = FALSE,
  save.models = FALSE,
  plan = "sequential",
  workers = NULL,
  inla.mode = "classic",
  num.threads = NULL
)

Arguments

carto

object of class SpatialPolygonsDataFrame or sf. This object must contain at least the variable with the identifiers of the spatial areal units specified in the argument ID.area.

data

object of class data.frame that must contain the target variables of interest specified in the arguments ID.area, ID.disease, O and E.

ID.area

character; name of the variable that contains the IDs of spatial areal units. The values of this variable must match those given in the carto and data variable.

ID.disease

character; name of the variable that contains the IDs of the diseases.

ID.group

character; name of the variable that contains the IDs of the spatial partition (grouping variable). Only required if model="partition".

O

character; name of the variable that contains the observed number of cases for each areal unit and disease.

E

character; name of the variable that contains either the expected number of cases or the population at risk for each areal unit and disease.

W

optional argument with the binary adjacency matrix of the spatial areal units. If NULL (default), this object is computed from the carto argument (two areas are considered as neighbours if they share a common border).

prior

one of either "intrinsic" (default), "Leroux", "proper", or "iid" which specifies the prior distribution considered for the spatial random effect.

model

one of either "global" or "partition" (default), which specifies the Global model or one of the scalable model proposal's (Disjoint model and k-order neighbourhood model, respectively).

k

numeric value with the neighbourhood order used for the partition model. Usually k=2 or 3 is enough to get good results. If k=0 (default) the Disjoint model is considered. Only required if model="partition".

strategy

one of either "gaussian", "simplified.laplace" (default), "laplace" or "adaptive", which specifies the approximation strategy considered in the inla function.

merge.strategy

one of either "mixture" or "original" (default), which specifies the merging strategy to compute posterior marginal estimates of relative risks. See mergeINLA for further details.

compute.intercept

CAUTION! This argument is deprecated from version 0.5.2.

compute.DIC

logical value; if TRUE (default) then approximate values of the Deviance Information Criterion (DIC) and Watanabe-Akaike Information Criterion (WAIC) are computed.

n.sample

numeric; number of samples to generate from the posterior marginal distribution of the linear predictor when computing approximate DIC/WAIC values. Default to 1000.

compute.fitted.values

logical value (default FALSE); if TRUE transforms the posterior marginal distribution of the linear predictor to the exponential scale (risks or rates).

save.models

logical value (default FALSE); if TRUE then a list with all the inla submodels is saved in '/temp/' folder, which can be used as input argument for the mergeINLA function.

plan

one of either "sequential" or "cluster", which specifies the computation strategy used for model fitting using the 'future' package. If plan="sequential" (default) the models are fitted sequentially and in the current R session (local machine). If plan="cluster" the models are fitted in parallel on external R sessions (local machine) or distributed in remote compute nodes.

workers

character or vector (default NULL) containing the identifications of the local or remote workers where the models are going to be processed. Only required if plan="cluster".

inla.mode

one of either "classic" (default) or "compact", which specifies the approximation method used by INLA. See help(inla) for further details.

num.threads

maximum number of threads the inla-program will use. See help(inla) for further details.

Details

For a full model specification and further details see the vignettes accompanying this package.

Value

This function returns an object of class inla. See the mergeINLA function for details.

References

\insertRef

bengtsson2020unifyingbigDM

\insertRef

besag1991bigDM

\insertRef

botella2015unifyingbigDM

\insertRef

leroux1999estimationbigDM

\insertRef

rue2009approximatebigDM

\insertRef

vicente2022bigDM

\insertRef

vanNiekerk2023bigDM

Examples

## Not run: 
if(require("INLA", quietly=TRUE)){

  ## Load the sf object that contains the spatial polygons of the municipalities of Spain ##
  data(Carto_SpainMUN)
  str(Carto_SpainMUN)

  ## Load the simulated cancer mortality data (three diseases) ##
  data(Data_MultiCancer)
  str(Data_MultiCancer)

  ## Fit the Global model with an iCAR prior for the within-disease random effects ##
  Global <- MCAR_INLA(carto=Carto_SpainMUN, data=Data_MultiCancer,
                      ID.area="ID", ID.disease="disease", O="obs", E="exp",
                      prior="intrinsic", model="global", strategy="gaussian")
  summary(Global)

  ## Fit the Disjoint model with an iCAR prior for the within-disease random effects ##
  ## using 4 local clusters to fit the models in parallel                            ##
  Disjoint <- MCAR_INLA(carto=Carto_SpainMUN, data=Data_MultiCancer,
                        ID.area="ID", ID.disease="disease", O="obs", E="exp", ID.group="region",
                        prior="intrinsic", model="partition", k=0, strategy="gaussian",
                        plan="cluster", workers=rep("localhost",4))
  summary(Disjoint)

  ## 1st-order neighbourhood model with an iCAR prior for the within-disease random effects ##
  ## using 4 local clusters to fit the models in parallel                                   ##
  order1 <- MCAR_INLA(carto=Carto_SpainMUN, data=Data_MultiCancer,
                      ID.area="ID", ID.disease="disease", O="obs", E="exp", ID.group="region",
                      prior="intrinsic", model="partition", k=1, strategy="gaussian",
                      plan="cluster", workers=rep("localhost",4))
  summary(order1)
}

## End(Not run)


bigDM documentation built on Sept. 11, 2024, 9:05 p.m.