Prediction for spatial simultaneous autoregressive linear model objects
Description
predict.sarlm()
calculates predictions as far as is at present possible for for spatial simultaneous autoregressive linear
model objects, using Haining's terminology for decomposition into
trend, signal, and noise, or other types of predictors — see references.
Usage
1 2 3 4 5 6 7 8 9 10  ## S3 method for class 'sarlm'
predict(object, newdata = NULL, listw = NULL, pred.type = "TS", all.data = FALSE,
zero.policy = NULL, legacy = TRUE, legacy.mixed = FALSE, power = NULL, order = 250,
tol = .Machine$double.eps^(3/5), spChk = NULL, ...)
## S3 method for class 'SLX'
predict(object, newdata, listw, zero.policy=NULL, ...)
## S3 method for class 'sarlm.pred'
print(x, ...)
## S3 method for class 'sarlm.pred'
as.data.frame(x, ...)

Arguments
object 

newdata 
data frame in which to predict — if NULL, predictions are for the data on which the model was fitted. Should have row names corresponding to region.id. If row names are exactly the same than the ones used for training, it uses insample predictors for forecast. See ‘Details’ 
listw 
a 
pred.type 
predictor type — default “TS”, use decomposition into
trend, signal, and noise ; other types available depending on 
all.data 
(only applies to 
zero.policy 
default NULL, use global option value; if TRUE assign zero to the lagged value of zones without neighbours, if FALSE (default) assign NA  causing the function to terminate with an error 
legacy 
(only applies to lag and Durbin (mixed) models for 
legacy.mixed 
(only applies to mixed models if newdata is not NULL) default FALSE: compute lagged variables from both insample and outofsample units with [WX]o and [WX]s where 
power 
(only applies to lag and Durbin (mixed) models for “TS”, “KP1”, “KP2”, “KP3”, “TC”, “TC1”, “BP”, “BP1”, “BPN”, “BPN1”, “BPW” and “BPW1” types) use 
order 
power series maximum limit if 
tol 
tolerance for convergence of power series if 
spChk 
should the row names of data frames be checked against the spatial objects for identity integrity, TRUE, or FALSE, default NULL to use 
x 
the object to be printed 
... 
further arguments passed through 
Details
The function supports three types of prediction. Insample prediction is the computation of predictors on the data used to fit the model (newdata=NULL
). Prevision, also called forecast, is the computation of some predictors (“trend”, insample “TC” and outofsample “TS”) on the same spatial units than the ones used to fit the model, but with different observations of the variables in the model (row names of newdata
should have the same row names than the data frame used to fit the model). And outofsample prediction is the computation of predictors on other spatial units than the ones used to fit the model (newdata
has different row names). For extensive definitions, see ThomasAgnan et al. (2015).
pred.type
of predictors are available according to the model of object
an to the type of prediction. In the two following tables, “yes” means that the predictor can be used with the model, “no” means that predict.sarlm()
will stop with an error, and “yes*” means that the predictor is not designed for the specified model, but it can be used with predict.sarlm()
. In the last case, be careful with the computation of a inappropriate predictor.
Insample predictors by models
pred.type  sem (mixed)  lag (mixed)  sac (mixed) 
“trend”  yes  yes  yes 
“TS”  yes  yes  no 
“TC”  no  yes  yes* 
“BP”  no  yes  yes* 
Note that only “trend” and “TC” are available for prevision.
Outofsample predictors by models
pred.type  sem (mixed)  lag (mixed)  sac (mixed) 
“trend”  yes  yes  yes 
“TS”  yes  yes  no 
“TS1” or “KP4”  no  yes  yes 
“TC”  no  yes  yes* 
“TC1” or “KP1”  yes  yes  yes 
“BP”  no  yes  yes* 
“BP1”  no  yes  yes* 
“BPW”  no  yes  yes* 
“BPW1”  no  yes  yes* 
“BN”  no  yes  yes* 
“BPN1”  no  yes  yes* 
“KP2”  yes  yes  yes 
“KP3”  yes  yes  yes 
“KP5”  yes  no  yes* 
Values for pred.type=
include “TS1”, “TC”, “TC1”, “BP”, “BP1”, “BPW”, “BPW1”, “BPN”, “BPN1”, following the notation in ThomasAgnan et al. (2015), and for pred.type=
“KP1”, “KP2”, “KP3”, “KP4”, “KP5”, following the notation in Kelejian et al. (2007). pred.type="TS"
is described bellow and in Bivand (2002).
In the following, the trend is the nonspatial smooth, the signal is the
spatial smooth, and the noise is the residual. The fit returned by pred.type="TS"
is the
sum of the trend and the signal.
When pred.type="TS"
, the function approaches prediction first by dividing invocations between
those with or without newdata. When no newdata is present, the response
variable may be reconstructed as the sum of the trend, the signal, and the
noise (residuals). Since the values of the response variable are known,
their spatial lags are used to calculate signal components (Cressie 1993, p. 564). For the error
model, trend = X beta, and signal = lambda W y  lambda W X beta. For the lag and mixed
models, trend = X beta, and signal = rho W y.
This approach differs from the design choices made in other software, for example GeoDa, which does not use observations of the response variable, and corresponds to the newdata situation described below.
When however newdata is used for prediction, no observations of the response variable being predicted are available. Consequently, while the trend components are the same, the signal cannot take full account of the spatial smooth. In the error model and Durbin error model, the signal is set to zero, since the spatial smooth is expressed in terms of the error: inv(I  lambda W) e.
In the lag model, the signal can be expressed in the following way (for legacy=TRUE):
(I  rho W) y = X beta + e
y = inv(I  rho W) X beta + inv(I  rho W) e
giving a feasible signal component of:
rho W y = rho W inv(I  rho W) X beta
For legacy=FALSE, the trend is computed first as:
X beta
next the prediction using the DGP:
inv(I  rho W) X beta
and the signal is found as the difference between prediction and trend. The numerical results for the legacy and DGP methods are identical.
setting the error term to zero. This also means that predictions of the signal component for lag and mixed models require the inversion of an nbyn matrix.
Because the outcomes of the spatial smooth on the error term are unobservable, this means that the signal values for newdata are incomplete. In the mixed model, the spatially lagged RHS variables influence both the trend and the signal, so that the root mean square prediction error in the examples below for this case with newdata is smallest, although the model was not the best fit.
If newdata
has more than one row, leaveoneout predictors (pred.type=
include “TS1”, “TC1”, “BP1”, “BPW1”, “BPN1”, “KP1”, “KP2”, “KP3”, “KP4”, “KP5”) are computed separatly on each outofsample unit.
listw
should be provided except if newdata=NULL
and pred.type=
include “TS”, “trend”, or if newdata
is not NULL
, pred.type="trend"
and object
is not a mixed model.
all.data
is useful when some outofsample predictors return different predictions for insample units, than the same predictor type computed only on insample data.
Value
predict.sarlm()
returns a vector of predictions with three attribute
vectors of trend, signal (only for pred.type="TS"
) and region.id values and two other attributes
of pred.type and call with class sarlm.pred
.
print.sarlm.pred()
is a print function for this class, printing and
returning a data frame with columns: "fit", "trend" and "signal" (when available) and with region.id as row names.
Author(s)
Roger Bivand Roger.Bivand@nhh.no and Martin Gubri
References
Haining, R. 1990 Spatial data analysis in the social and environmental sciences, Cambridge: Cambridge University Press, p. 258; Cressie, N. A. C. 1993 Statistics for spatial data, Wiley, New York; ThomasAgnan, C., Laurent, T. and Goulard, M. 2015 About predictions in spatial autoregressive models: Optimal and almost optimal strategies, TSE Working Paper, n. 13452; Kelejian, H. H. and Prucha, I. R. 2007 The relative efficiencies of various predictors in spatial econometric models containing spatial lags, Regional Science and Urban Economics, Volume 37, Issue 3, 363–374; Bivand, R. 2002 Spatial econometrics functions in R: Classes and methods, Journal of Geographical Systems, Volume 4, No. 4, 405–421
See Also
errorsarlm
, lagsarlm
, sacsarlm
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54  data(oldcol)
lw < nb2listw(COL.nb)
COL.lag.eig < lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, lw)
COL.mix.eig < lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, lw,
type="mixed")
print(p1 < predict(COL.mix.eig))
print(p2 < predict(COL.mix.eig, newdata=COL.OLD, listw=lw, pred.type = "TS",
legacy.mixed = TRUE))
AIC(COL.mix.eig)
sqrt(deviance(COL.mix.eig)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME  as.vector(p1))^2)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME  as.vector(p2))^2)/length(COL.nb))
COL.err.eig < errorsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, lw)
AIC(COL.err.eig)
sqrt(deviance(COL.err.eig)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME  as.vector(predict(COL.err.eig)))^2)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME  as.vector(predict(COL.err.eig, newdata=COL.OLD,
listw=lw, pred.type = "TS")))^2)/length(COL.nb))
COL.SDerr.eig < errorsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, lw,
etype="emixed")
AIC(COL.SDerr.eig)
sqrt(deviance(COL.SDerr.eig)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME  as.vector(predict(COL.SDerr.eig)))^2)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME  as.vector(predict(COL.SDerr.eig, newdata=COL.OLD,
listw=lw, pred.type = "TS")))^2)/length(COL.nb))
AIC(COL.lag.eig)
sqrt(deviance(COL.lag.eig)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME  as.vector(predict(COL.lag.eig)))^2)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME  as.vector(predict(COL.lag.eig, newdata=COL.OLD,
listw=lw, pred.type = "TS")))^2)/length(COL.nb))
p3 < predict(COL.mix.eig, newdata=COL.OLD, listw=lw, pred.type = "TS",
legacy=FALSE, legacy.mixed = TRUE)
all.equal(p2, p3, check.attributes=FALSE)
p4 < predict(COL.mix.eig, newdata=COL.OLD, listw=lw, pred.type = "TS",
legacy=FALSE, power=TRUE, legacy.mixed = TRUE)
all.equal(p2, p4, check.attributes=FALSE)
p5 < predict(COL.mix.eig, newdata=COL.OLD, listw=lw, pred.type = "TS",
legacy=TRUE, power=TRUE, legacy.mixed = TRUE)
all.equal(p2, p5, check.attributes=FALSE)
COL.SLX < lmSLX(CRIME ~ INC + HOVAL, data=COL.OLD, listw=lw)
pslx0 < predict(COL.SLX)
pslx1 < predict(COL.SLX, newdata=COL.OLD, listw=lw)
all.equal(pslx0, pslx1)
COL.OLD1 < COL.OLD
COL.OLD1$INC < COL.OLD1$INC + 1
pslx2 < predict(COL.SLX, newdata=COL.OLD1, listw=lw)
sum(coef(COL.SLX)[c(2,4)])
mean(pslx2pslx1)
