stan_car | R Documentation |
Use the CAR model as a prior on parameters, or fit data to a spatial Gaussian CAR model.
stan_car(
formula,
slx,
re,
data,
C,
car_parts = prep_car_data(C, "WCAR"),
family = gaussian(),
prior = NULL,
ME = NULL,
centerx = FALSE,
prior_only = FALSE,
censor_point,
zmp,
chains = 4,
iter = 2000,
refresh = 500,
keep_all = FALSE,
slim = FALSE,
drop = NULL,
pars = NULL,
control = NULL,
quiet = FALSE,
...
)
formula |
A model formula, following the R |
slx |
Formula to specify any spatially-lagged covariates. As in, |
re |
To include a varying intercept (or "random effects") term, alpha_re ~ N(0, alpha_tau) alpha_tau ~ Student_t(d.f., location, scale). With the CAR model, any |
data |
A |
C |
Spatial connectivity matrix which will be used internally to create |
car_parts |
A list of data for the CAR model, as returned by |
family |
The likelihood function for the outcome variable. Current options are |
prior |
A named list of parameters for prior distributions (see
.
|
ME |
To model observational uncertainty (i.e. measurement or sampling error) in any or all of the covariates, provide a list of data as constructed by the |
centerx |
To center predictors on their mean values, use |
prior_only |
Logical value; if |
censor_point |
Integer value indicating the maximum censored value; this argument is for modeling censored (suppressed) outcome data, typically disease case counts or deaths. |
zmp |
Use zero-mean parameterization for the CAR model? Only relevant for Poisson and binomial outcome models (i.e., hierarchical models). See details below; this can sometimes improve MCMC sampling when the data is sparse, but does not alter the model specification. |
chains |
Number of MCMC chains to use. |
iter |
Number of samples per chain. |
refresh |
Stan will print the progress of the sampler every |
keep_all |
If |
slim |
If |
drop |
Provide a vector of character strings to specify the names of any parameters that you do not want MCMC samples for. Dropping parameters in this way can improve sampling speed and reduce memory usage. The following parameter vectors can potentially be dropped from CAR models:
If |
pars |
Optional; specify any additional parameters you'd like stored from the Stan model. |
control |
A named list of parameters to control the sampler's behavior. See |
quiet |
Controls (most) automatic printing to the console. By default, any prior distributions that have not been assigned by the user are printed to the console. If |
... |
Other arguments passed to |
CAR models are discussed in Cressie and Wikle (2011, p. 184-88), Cressie (2015, Ch. 6-7), and Haining and Li (2020, p. 249-51). It is often used for areal or lattice data.
Details for the Stan code for this implementation of the CAR model can be found in Donegan (2021) and the geostan vignette 'Custom spatial models with Rstan and geostan'.
For outcome variable y
and N-by-N connectivity matrix C
, a standard spatial CAR model may be written as
y = \mu + \rho C (y - \mu) + \epsilon
where \rho
is a spatial dependence or autocorrelation parameter. The models accounts for autocorrelated errors in the regression.
The model is defined by its covariance matrix. The general scheme for the CAR model is as follows:
y \sim Gauss( \mu, ( I - \rho C)^{-1} M),
where I
is the identity matrix, \rho
is a spatial dependence parameter, C
is a spatial connectivity matrix, and M
is a diagonal matrix of variance terms. The diagonal of M
contains a scale parameter \tau
multiplied by a vector of weights (often set to be proportional to the inverse of the number of neighbors assigned to each site).
The covariance matrix of the CAR model contains two parameters: \rho
(car_rho
) which controls the kind (positive or negative) and degree of spatial autocorrelation, and the scale parameter \tau
. The range of permissible values for \rho
depends on the specification of \boldsymbol C
and \boldsymbol M
; for specification options, see prep_car_data and Cressie and Wikle (2011, pp. 184-188) or Donegan (2021).
Further details of the models and results depend on the family
argument, as well as on the particular CAR specification chosen (from prep_car_data).
When family = auto_gaussian()
(the default), the CAR model is applied directly to the data as follows:
y \sim Gauss( \mu, (I - \rho C)^{-1} M),
where \mu
is the mean vector (with intercept, covariates, etc.), C
is a spatial connectivity matrix, and M
is a known diagonal matrix containing the conditional variances \tau_i^2
. C
and M
are provided by prep_car_data.
The auto-Gaussian model contains an implicit spatial trend (i.e. autocorrelation) component \phi
which can be calculated as follows (Cressie 2015, p. 564):
\phi = \rho C (y - \mu).
This term can be extracted from a fitted auto-Gaussian model using the spatial method.
When applied to a fitted auto-Gaussian model, the residuals.geostan_fit method returns 'de-trended' residuals R
by default. That is,
R = y - \mu - \rho C (y - \mu).
To obtain "raw" residuals (y - \mu
), use residuals(fit, detrend = FALSE)
. Similarly, the fitted values obtained from the fitted.geostan_fit will include the spatial trend term by default.
For family = poisson()
, the model is specified as:
y \sim Poisson(e^{O + \lambda})
\lambda \sim Gauss(\mu, (I - \rho C)^{-1} \boldsymbol M).
If the raw outcome consists of a rate \frac{y}{p}
with observed counts y
and denominator p
(often this will be the size of the population at risk), then the offset term O=log(p)
is the log of the denominator.
The same model can also be described or specified such that \phi
has a mean of zero:
y \sim Poisson(e^{O + \mu + \phi})
\phi \sim Gauss(0, (I - \rho C)^{-1} \boldsymbol M).
This is the zero-mean parameterization (ZMP) of the CAR model; although the non-ZMP is typically better for MCMC sampling, use of the ZMP can greatly improve MCMC sampling when the data is sparse. Use zmp = TRUE
in stan_car
to apply this specification. (See the geostan vignette on 'custom spatial models' for full details on implementation of the ZMP.)
For all CAR Poisson models, the spatial method returns the (zero-mean) parameter vector \phi
. When zmp = FALSE
(the default), phi
is obtained by subtraction: \phi = \lambda - \mu
.
In the Poisson CAR model, \phi
contains a latent spatial trend as well as additional variation around it: \phi_i = \rho \sum_{i=1}^n c_{ij} \phi_j + \epsilon_i
, where \epsilon_i \sim Gauss(0, \tau_i^2)
. If for some reason you would like to extract the smoother latent/implicit spatial trend from \phi
, you can do so by calculating (following Cressie 2015, p. 564):
\rho C \phi.
For family = binomial()
, the model is specified as:
y \sim Binomial(N, \lambda)
logit(\lambda) \sim Gauss(\mu, (I - \rho C)^{-1} \boldsymbol M).
where outcome data y
are counts, N
is the number of trials, \lambda
is the 'success' rate, and \mu
contains the intercept and possibly covariates. Note that the model formula should be structured as: cbind(sucesses, failures) ~ x
, such that trials = successes + failures
.
As is also the case for the Poisson model, \phi
contains a latent spatial trend as well as additional variation around it. If you would like to extract the latent/implicit spatial trend from \phi
, you can do so by calculating:
\rho C \phi.
The zero-mean parameterization (ZMP) of the CAR model can also be applied here (see the Poisson model for details); ZMP provides an equivalent model specification that can improve MCMC sampling when data is sparse.
The CAR models can also incorporate spatially-lagged covariates, measurement/sampling error in covariates (particularly when using small area survey estimates as covariates), missing outcome data, and censored outcomes (such as arise when a disease surveillance system suppresses data for privacy reasons). For details on these options, please see the Details section in the documentation for stan_glm.
An object of class class geostan_fit
(a list) containing:
Summaries of the main parameters of interest; a data frame.
Residual spatial autocorrelation as measured by the Moran coefficient.
an object of class stanfit
returned by rstan::stan
a data frame containing the model data
the user-provided or default family
argument used to fit the model
The model formula provided by the user (not including CAR component)
The slx
formula
A list containing re
, the varying intercepts (re
) formula if provided, and
Data
a data frame with columns id
, the grouping variable, and idx
, the index values assigned to each group.
Prior specifications.
If covariates are centered internally (centerx = TRUE
), then x_center
is a numeric vector of the values on which covariates were centered.
A data frame with the name of the spatial component parameter (either "phi" or, for auto Gaussian models, "trend") and method ("CAR")
A list indicating if the object contains an ME model; if so, the user-provided ME list is also stored here.
Spatial connectivity matrix (in sparse matrix format).
Connor Donegan, connor.donegan@gmail.com
Besag, Julian (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society B36.2: 192–225.
Cressie, Noel (2015 (1993)). Statistics for Spatial Data. Wiley Classics, Revised Edition.
Cressie, Noel and Wikle, Christopher (2011). Statistics for Spatio-Temporal Data. Wiley.
Donegan, Connor (2021). Building spatial conditional autoregressive (CAR) models in the Stan programming language. OSF Preprints. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.31219/osf.io/3ey65")}.
Haining, Robert and Li, Guangquan (2020). Modelling Spatial and Spatial-Temporal Data: A Bayesian Approach. CRC Press.
##
## model mortality risk
##
# simple spatial model for log rates
data(georgia)
C <- shape2mat(georgia, style = "B")
cars <- prep_car_data(C)
# MCMC specs: set for purpose of demo speed
iter = 500
chains = 2
fit <- stan_car(log(rate.male) ~ 1, data = georgia,
car_parts = cars, iter = iter, chains = chains)
# model diagnostics
sp_diag(fit, georgia)
# A more appropriate model for mortality rates:
# hierarchical spatial Poisson model
fit <- stan_car(deaths.male ~ offset(log(pop.at.risk.male)),
car_parts = cars,
data = georgia,
family = poisson(),
iter = iter, chains = chains)
# model diagnostics
sp_diag(fit, georgia)
# county mortality rates
eta = fitted(fit)
# spatial trend component
phi = spatial(fit)
##
## Distance-based weights matrix:
## the 'DCAR' model
##
library(sf)
A <- shape2mat(georgia, "B")
D <- sf::st_distance(sf::st_centroid(georgia))
D <- D * A
dcars <- prep_car_data(D, "DCAR", k = 1)
Dfit <- stan_car(deaths.male ~ offset(log(pop.at.risk.male)),
data = georgia,
car = dcars,
family = poisson(),
iter = iter, chains = chains)
sp_diag(Dfit, georgia, dcars$C)
dic(Dfit); dic(fit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.