View source: R/geostan_fit-methods.R
predict.geostan_fit | R Documentation |
geostan_fit
modelsObtain predicted values from a fitted model by providing new covariate values.
## S3 method for class 'geostan_fit'
predict(
object,
newdata,
alpha = as.matrix(object, pars = "intercept"),
center = object$x_center,
summary = TRUE,
type = c("link", "response"),
add_slx = FALSE,
approx = FALSE,
K = 15,
...
)
object |
A fitted model object of class |
newdata |
A data frame in which to look for variables with which to predict. Note that if the model formula includes an offset term, |
alpha |
An N-by-1 matrix of MCMC samples for the intercept; this is provided by default. If used, note that the intercept needs to be provided on the scale of the linear predictor. This argument might be used if there is a need to incorporate the spatial trend term (as a spatially-varying intercept). |
center |
Optional vector of numeric values or a logical scalar to pass to |
summary |
If |
type |
By default, results from |
add_slx |
Logical. If |
approx |
For SAR models of type 'SLM' or 'SDLM' only; use an approximation for matrix inversion? See details below. |
K |
Number of matrix powers to use with |
... |
Not used |
The primary purpose of the predict method is to explore marginal effects of covariates. The uncertainty present in these predictions refers to uncertainty in the expected value of the model. The expectation does not include the error term of the model (nb: one expects actual observations to form a cloud of points around the expected value). By contrast, posterior_predict returns the complete (posterior) predictive distribution of the model (the expectation plus noise).
The model formula will be taken from object$formula
, and then a model matrix will be created by passing newdata
to the model.frame function (as in: model.frame(object$formula, newdata)
. Parameters are taken from as.matrix(object, pars = c("intercept", "beta"))
.
The examples illustrate how to use the function in most cases.
Special considerations apply to models with spatially-lagged covariates and a spatially-lagged dependent variable (i.e., the 'SLM' and 'SDLM' models fit by stan_sar).
Spatially-lagged covariates which were included via the slx
argument will, by default, not be included in the predicted values. (The user can have greater control by manually adding the spatially-lagged covariate to the main model formula.) The slx
term will be be included in predictions if add_slx = TRUE
or if the fitted model is a SAR model of type 'SLM' or 'SDLM'. In either of those cases, newdata
must have the same number of rows as were used to fit the original data.
The typical 'marginal effect' interpretation of the regression coefficients does not hold for the SAR models of type 'SLM' or 'SDLM'. For details on these 'spillover effects', see LeSage and Pace (2009), LeSage (2014), and impacts.
Predictions for the spatial lag model (SAR models of type 'SLM') are equal to:
(I - \rho W)^{-1} X \beta
where X \beta
contains the intercept and covariates. Predictions for the spatial Durbin lag model (SAR models of type 'SDLM') are equal to:
(I - \rho W)^{-1} (X \beta + WX \gamma)
where WX \gamma
are spatially lagged covariates multiplied by their coefficients. For this reason, the predict.geostan_fit
method requires that newdata
have as many rows as the original data (so that nrow(newdata) == nrow(object$C)
); the spatial weights matrix will be taken from object$C
.
The inverse of the matrix (I - \rho W)
can be time consuming to compute (especially when iterating over MCMC samples). You can use approx = TRUE
to approximate the inverse using a series of matrix powers. The argument K
controls how many powers to use for the approximation. As a rule, higher values of \rho
require larger K
to obtain accuracy. Notice that \rho^K
should be close to zero for the approximation to hold. For example, for \rho = .5
a value of K=8
may suffice (0.5^8 = 0.004
), but larger values of \rho
require higher values of K
.
In generalized linear models (such as Poisson and Binomial models) marginal effects plots on the response scale may be sensitive to the level of other covariates in the model and to geographic location (given a spatially-varying mean value). If the model includes a spatial autocorrelation component (for example, you used a spatial CAR, SAR, or ESF model, or used the re
argument for random effects), by default these terms will be fixed at zero for the purposes of calculating marginal effects. If you want to change this, you can introduce a varying intercept manually using the alpha
argument.
If summary = FALSE
, a matrix of samples is returned. If summary = TRUE
(the default), a data frame is returned.
Goulard, Michael, Thibault Laurent, and Christine Thomas-Agnan (2017). About predictions in spatial autoregressive models: optimal and almost optimal strategies. Spatial Economic Analysis 12 (2-3): 304-325.
LeSage, James (2014). What Regional Scientists Need to Know about Spatial Econometrics. The Review of Regional Science 44: 13-32 (2014 Southern Regional Science Association Fellows Address).
LeSage, James, & Robert kelley Pace (2009). Introduction to Spatial Econometrics. Chapman and Hall/CRC.
data(georgia)
georgia$income <- georgia$income / 1e3
fit <- stan_glm(deaths.male ~ offset(log(pop.at.risk.male)) + log(income),
data = georgia,
re = ~ GEOID,
centerx = TRUE,
family = poisson(),
chains = 2, iter = 600) # for speed only
# note: pop.at.risk.male=1 leads to offset of log(pop.at.risk.male)=0
# so that the predicted values are rates
newdata <- data.frame(
income = seq(min(georgia$income),
max(georgia$income),
length.out = 200),
pop.at.risk.male = 1)
preds <- predict(fit, newdata, type = "response")
head(preds)
plot(preds$income,
preds$mean * 10e3,
type = "l",
ylab = "Deaths per 10,000",
xlab = "Income ($1,000s)")
# here the predictions are rates per 10,000
newdata$pop.at.risk.male <- 10e3
preds <- predict(fit, newdata, type = "response")
head(preds)
# plot range
y_lim <- c(min(preds$`2.5%`), max(preds$`97.5%`))
# plot line
plot(preds$income,
preds$mean,
type = "l",
ylab = "Deaths per 10,000",
xlab = "Income ($1,000s)",
ylim = y_lim,
axes = FALSE)
# add shaded cred. interval
x <- c(preds$income, rev(preds$income))
y <- c(preds$`2.5%`, rev(preds$`97.5%`))
polygon(x = x, y = y,
col = rgb(0.1, 0.2, 0.3, 0.3),
border = NA)
# add axes
yat = seq(0, 300, by = 20)
axis(2, at = yat)
xat = seq(0, 200, by = 10)
axis(1, at = xat)
# show county incomes
rug(georgia$income)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.