stan_sar | R Documentation |
Fit data to a simultaneous spatial autoregressive (SAR) model, or use the SAR model as the prior model for a parameter vector in a hierarchical model.
stan_sar(
formula,
slx,
re,
data,
C,
sar_parts = prep_sar_data(C, quiet = TRUE),
family = auto_gaussian(),
type = c("SEM", "SDEM", "SDLM", "SLM"),
prior = NULL,
ME = NULL,
centerx = FALSE,
prior_only = FALSE,
censor_point,
zmp,
chains = 4,
iter = 2000,
refresh = 500,
keep_all = FALSE,
pars = NULL,
slim = FALSE,
drop = NULL,
control = NULL,
quiet = FALSE,
...
)
formula |
A model formula, following the R |
slx |
Formula to specify any spatially-lagged covariates. As in, |
re |
To include a varying intercept (or "random effects") term, alpha_re ~ N(0, alpha_tau) alpha_tau ~ Student_t(d.f., location, scale). With the SAR model, any |
data |
A |
C |
Spatial connectivity matrix which will be used internally to create |
sar_parts |
List of data constructed by |
family |
The likelihood function for the outcome variable. Current options are |
type |
Type of SAR model (character string): spatial error model ('SEM'), spatial Durbin error model ('SDEM'), spatial Durbin lag model ('SDLM'), or spatial lag model ('SLM'). see Details below. |
prior |
A named list of parameters for prior distributions (see
.
|
ME |
To model observational uncertainty in any or all of the covariates (i.e. measurement or sampling error), provide a list of data constructed by the |
centerx |
To center predictors on their mean values, use |
prior_only |
Logical value; if |
censor_point |
Integer value indicating the maximum censored value; this argument is for modeling censored (suppressed) outcome data, typically disease case counts or deaths which are left-censored to protect confidentiality when case counts are very low. |
zmp |
Use zero-mean parameterization for the SAR model? Only relevant for Poisson and binomial outcome models (i.e., hierarchical models). See details below; this can sometimes improve MCMC sampling when the data is sparse, but does not alter the model specification. |
chains |
Number of MCMC chains to use. |
iter |
Number of MCMC samples per chain. |
refresh |
Stan will print the progress of the sampler every |
keep_all |
If |
pars |
Specify any additional parameters you'd like stored from the Stan model. |
slim |
If |
drop |
Provide a vector of character strings to specify the names of any parameters that you do not want MCMC samples for. Dropping parameters in this way can improve sampling speed and reduce memory usage. The following parameter vectors can potentially be dropped from SAR models:
Using |
control |
A named list of parameters to control the sampler's behavior. See |
quiet |
Controls (most) automatic printing to the console. By default, any prior distributions that have not been assigned by the user are printed to the console; if |
... |
Other arguments passed to |
Discussions of SAR models may be found in Cliff and Ord (1981), Cressie (2015, Ch. 6), LeSage and Pace (2009), and LeSage (2014). The Stan implementation draws from Donegan (2021). It is a multivariate normal distribution with covariance matrix of \Sigma = \sigma^2 (I - \rho C)^{-1}(I - \rho C')^{-1}.
.
There are two SAR specification options which are commonly known as the spatial error ('SEM') and the spatial lag ('SLM') models. When the spatial-lags of all covariates are included in the linear predictor (as in \mu = \alpha + X \beta + W X \gamma
), then the model is referred to as a spatial Durbin model; depending on the model type, it becomes a spatial Durbin error model ('SDEM') or a spatial Durbin lag model ('SDLM'). To control which covariates are introduced in spatial-lag form, use the slx
argument together with 'type = SEM' or 'type = SLM'.
The spatial error specification ('SEM') is
y = \mu + ( I - \rho C)^{-1} \epsilon
\epsilon \sim Gauss(0, \sigma^2)
where C
is the spatial connectivity matrix, I
is the n-by-n identity matrix, and \rho
is a spatial autocorrelation parameter. In words, the errors of the regression equation are spatially autocorrelated. The expected value for the SEM is the usual \mu
: the intercept plus X*beta
and any other terms added to the linear predictor.
Re-arranging terms, the model can also be written as follows:
y = \mu + \rho C (y - \mu) + \epsilon
which shows more intuitively the implicit spatial trend component, \phi = \rho C (y - \mu)
. This term \phi
can be extracted from a fitted auto-Gaussian/auto-normal model using the spatial method.
When applied to a fitted auto-Gaussian model, the residuals.geostan_fit method returns 'de-trended' residuals R
by default. That is,
R = y - \mu - \rho C (y - \mu).
To obtain "raw" residuals (y - \mu
), use residuals(fit, detrend = FALSE)
. Similarly, the fitted values obtained from the fitted.geostan_fit will include the spatial trend term by default.
The second SAR specification type is the 'spatial lag of y' ('SLM'). This model describes a diffusion or contagion process:
y = \rho C y + \mu + \epsilon
\epsilon \sim Gauss(0, \sigma^2)
This is very attractive for modeling actual contagion or diffusion processes (or static snapshots of such processes). The model does not allow for the usual interpretation of regression coefficients as marginal effects. To interpret SLM results, use impacts.
Note that the expected value of the SLM is equal to (I - \rho C)^{-1} \mu
.
The spatial method returns the vector
\phi = \rho C y,
the spatial lag of y
.
The residuals.geostan_fit method returns 'de-trended' residuals R
by default:
R = y - \rho C y - \mu,
where \mu
contains the intercept and any covariates (and possibly other terms).
Similarly, the fitted values obtained from the fitted.geostan_fit will include the spatial trend \rho C y
by default to equal
\rho C y + \mu.
For now at least, the SLM/SDLM option is only supported for auto-normal models (as opposed to hierarchical Poisson and binomial models).
For family = poisson()
, the model is specified as:
y \sim Poisson(e^{O + \lambda})
\lambda \sim Gauss(\mu, \Sigma)
\Sigma = \sigma^2 (I - \rho C)^{-1}(I - \rho C')^{-1}
where O
is a constant/offset term and e^\lambda
is a rate parameter.
If the raw outcome consists of a rate \frac{y}{p}
with observed counts y
and denominator p
(often this will be the size of the population at risk), then the offset term should be the log of the denominator: O=log(p)
.
This same model can be written (equivalently) as:
y \sim Poisson(e^{O + \mu + \phi})
\phi \sim Gauss(0, \Sigma)
This second version is referred to here as the zero-mean parameterization (ZMP), since the SAR model is forced to have mean of zero. Although the non-ZMP is typically better for MCMC sampling, use of the ZMP can greatly improve MCMC sampling when the data is sparse. Use zmp = TRUE
in stan_sar
to apply this specification. (See the geostan vignette on 'custom spatial models' for full details on implementation of the ZMP.)
For Poisson models, the spatial method returns the (zero-mean) parameter vector \phi
. When zmp = FALSE
(the default), \phi
is obtained by subtraction: \phi = \lambda - \mu
.
In the Poisson SAR model, \phi
contains a latent (smooth) spatial trend as well as additional variation around it (this is merely a verbal description of the CAR model). If you would like to extract the latent/implicit spatial trend from \phi
, you can do so by calculating:
\rho C \phi.
For family = binomial()
, the model is specified as:
y \sim Binomial(N, \lambda)
logit(\lambda) \sim Gauss(\mu, \Sigma)
\Sigma = \sigma^2 (I - \rho C)^{-1}(I - \rho C')^{-1},
where outcome data y
are counts, N
is the number of trials, and \lambda
is the rate of 'success'. Note that the model formula should be structured as: cbind(sucesses, failures) ~ 1
(for an intercept-only model), such that trials = successes + failures
.
For fitted Binomial models, the spatial
method will return the parameter vector phi
, equivalent to:
\phi = logit(\lambda) - \mu.
The zero-mean parameterization (ZMP) of the SAR model can also be applied here (see the Poisson model for details); ZMP provides an equivalent model specification that can improve MCMC sampling when data is sparse.
As is also the case for the Poisson model, \phi
contains a latent spatial trend as well as additional variation around it. If you would like to extract the latent/implicit spatial trend from \phi
, you can do so by calculating:
\rho C \phi.
The SAR models can also incorporate spatially-lagged covariates, measurement/sampling error in covariates (particularly when using small area survey estimates as covariates), missing outcome data (for Poisson and binomial models), and censored outcomes (such as arise when a disease surveillance system suppresses data for privacy reasons). For details on these options, please see the Details section in the documentation for stan_glm.
An object of class class geostan_fit
(a list) containing:
Summaries of the main parameters of interest; a data frame.
Residual spatial autocorrelation as measured by the Moran coefficient.
an object of class stanfit
returned by rstan::stan
a data frame containing the model data
the user-provided or default family
argument used to fit the model
The model formula provided by the user (not including CAR component)
The slx
formula
A list containing re
, the varying intercepts (re
) formula if provided, and
Data
a data frame with columns id
, the grouping variable, and idx
, the index values assigned to each group.
Prior specifications.
If covariates are centered internally (centerx = TRUE
), then x_center
is a numeric vector of the values on which covariates were centered.
A data frame with the name of the spatial component parameter (either "phi" or, for auto Gaussian models, "trend") and method ("SAR")
A list indicating if the object contains an ME model; if so, the user-provided ME list is also stored here.
Spatial weights matrix (in sparse matrix format).
Type of SAR model: 'SEM', 'SDEM', 'SDLM', or 'SLM'.
Connor Donegan, connor.donegan@gmail.com
Cliff, A D and Ord, J K (1981). Spatial Processes: Models and Applications. Pion.
Cressie, Noel (2015 (1993)). Statistics for Spatial Data. Wiley Classics, Revised Edition.
Cressie, Noel and Wikle, Christopher (2011). Statistics for Spatio-Temporal Data. Wiley.
Donegan, Connor (2021). Building spatial conditional autoregressive (CAR) models in the Stan programming language. OSF Preprints. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.31219/osf.io/3ey65")}.
LeSage, James (2014). What Regional Scientists Need to Know about Spatial Econometrics. The Review of Regional Science 44: 13-32 (2014 Southern Regional Science Association Fellows Address).
LeSage, James, & Pace, Robert Kelley (2009). Introduction to Spatial Econometrics. Chapman and Hall/CRC.
##
## simulate SAR data on a regular grid
##
sars <- prep_sar_data2(row = 10, col = 10, quiet = TRUE)
w <- sars$W
# draw x
x <- sim_sar(w = w, rho = 0.5)
# draw y = mu + rho*W*(y - mu) + epsilon
# beta = 0.5, rho = 0.5
y <- sim_sar(w = w, rho = .5, mu = 0.5 * x)
dat <- data.frame(y = y, x = x)
##
## fit SEM
##
fit_sem <- stan_sar(y ~ x, data = dat, sar = sars,
chains = 1, iter = 800)
print(fit_sem)
##
## data for SDEM
##
# mu = x*beta + wx*gamma; beta=1, gamma=-0.25
x <- sim_sar(w = w, rho = 0.5)
mu <- 1 * x - 0.25 * (w %*% x)[,1]
y <- sim_sar(w = w, rho = .5, mu = mu)
# or for SDLM:
# y <- sim_sar(w = w, rho = 0.5, mu = mu, type = "SLM")
dat <- data.frame(y=y, x=x)
#
## fit models
##
# SDEM
# y = mu + rho*W*(y - mu) + epsilon
# mu = beta*x + gamma*Wx
fit_sdem <- stan_sar(y ~ x, data = dat,
sar_parts = sars, type = "SDEM",
iter = 800, chains = 1,
quiet = TRUE)
# SDLM
# y = rho*Wy + beta*x + gamma*Wx + epsilon
fit_sdlm <- stan_sar(y ~ x, data = dat,
sar_parts = sars,
type = "SDLM",
iter = 800,
chains = 1,
quiet = TRUE)
# compare by DIC
dic(fit_sdem)
dic(fit_sdlm)
##
## Modeling mortality rates
##
# simple spatial regression
data(georgia)
W <- shape2mat(georgia, style = "W")
fit <- stan_sar(log(rate.male) ~ 1,
C = W,
data = georgia,
iter = 900
)
# view fitted vs. observed, etc.
sp_diag(fit, georgia)
# A more appropriate model for count data:
# hierarchical spatial poisson model
fit2 <- stan_sar(deaths.male ~ offset(log(pop.at.risk.male)),
C = W,
data = georgia,
family = poisson(),
chains = 1, # for ex. speed only
iter = 900,
quiet = TRUE
)
# view fitted vs. observed, etc.
sp_diag(fit2, georgia)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.