predict.sarlm: Prediction for spatial simultaneous autoregressive linear...
In spdep: Spatial Dependence: Weighting Schemes, Statistics and Models

Description Usage Arguments Details Value Author(s) References See Also Examples

predict.sarlm() calculates predictions as far as is at present possible for for spatial simultaneous autoregressive linear model objects, using Haining's terminology for decomposition into trend, signal, and noise, or other types of predictors — see references.

## S3 method for class 'sarlm'
predict(object, newdata = NULL, listw = NULL, pred.type = "TS", all.data = FALSE,
 zero.policy = NULL, legacy = TRUE, legacy.mixed = FALSE, power = NULL, order = 250,
 tol = .Machine$double.eps^(3/5), spChk = NULL, ...)
## S3 method for class 'SLX'
predict(object, newdata, listw, zero.policy=NULL, ...)
## S3 method for class 'sarlm.pred'
print(x, ...)
## S3 method for class 'sarlm.pred'
as.data.frame(x, ...)

`object`	`sarlm` object returned by `lagsarlm`, `errorsarlm` or `sacsarlm`, the method for SLX objects takes the output of `lmSLX`
`newdata`	data frame in which to predict — if NULL, predictions are for the data on which the model was fitted. Should have row names corresponding to region.id. If row names are exactly the same than the ones used for training, it uses in-sample predictors for forecast. See ‘Details’
`listw`	a `listw` object created for example by `nb2listw`. In the out-of-sample prediction case (ie. if newdata is not NULL), if `legacy.mixed=FALSE` or if `pred.type!="TS"`, it should include both in-sample and out-of-sample spatial units. In this case, if regions of the listw are not in the correct order, they are reordered. See ‘Details’
`pred.type`	predictor type — default “TS”, use decomposition into trend, signal, and noise ; other types available depending on `newdata`. If `newdata=NULL` (in-sample prediction), “TS”, “trend”, “TC” and “BP” are available. If `newdata` is not NULL and its row names are the same than the `data` used to fit the model (forecast case), “TS”, “trend” and “TC” are available. In other cases (out-of-sample prediction), “TS”, “trend”, “KP1”, “KP2”, “KP3”, “KP4”, “KP5”, “TC”, “BP”, “BPW”, “BPN”, “TS1”, “TC1”, “BP1”, “BPW1” and “BPN1” are available. See ‘Details’ and references
`all.data`	(only applies to `pred.type="TC"` and newdata is not NULL) default FALSE: return predictions only for newdata units, if TRUE return predictions for all data units. See ‘Details’
`zero.policy`	default NULL, use global option value; if TRUE assign zero to the lagged value of zones without neighbours, if FALSE (default) assign NA - causing the function to terminate with an error
`legacy`	(only applies to lag and Durbin (mixed) models for `pred.type="TS"`) default TRUE: use ad-hoc predictor, if FALSE use DGP-based predictor
`legacy.mixed`	(only applies to mixed models if newdata is not NULL) default FALSE: compute lagged variables from both in-sample and out-of-sample units with [WX]o and [WX]s where `X=cbind(Xs, Xo)`, if TRUE compute lagged variables independantly between in-sample and out-of-sample units with Woo Xo and Wss Xs
`power`	(only applies to lag and Durbin (mixed) models for “TS”, “KP1”, “KP2”, “KP3”, “TC”, “TC1”, “BP”, “BP1”, “BPN”, “BPN1”, “BPW” and “BPW1” types) use `powerWeights`, if default NULL, set FALSE if `object$method` is “eigen”, otherwise TRUE
`order`	power series maximum limit if `power` is TRUE
`tol`	tolerance for convergence of power series if `power` is TRUE
`spChk`	should the row names of data frames be checked against the spatial objects for identity integrity, TRUE, or FALSE, default NULL to use `get.spChkOption()`
`x`	the object to be printed

...

further arguments passed through

The function supports three types of prediction. In-sample prediction is the computation of predictors on the data used to fit the model (newdata=NULL). Prevision, also called forecast, is the computation of some predictors (“trend”, in-sample “TC” and out-of-sample “TS”) on the same spatial units than the ones used to fit the model, but with different observations of the variables in the model (row names of newdata should have the same row names than the data frame used to fit the model). And out-of-sample prediction is the computation of predictors on other spatial units than the ones used to fit the model (newdata has different row names). For extensive definitions, see Thomas-Agnan et al. (2015).

pred.type of predictors are available according to the model of object an to the type of prediction. In the two following tables, “yes” means that the predictor can be used with the model, “no” means that predict.sarlm() will stop with an error, and “yes*” means that the predictor is not designed for the specified model, but it can be used with predict.sarlm(). In the last case, be careful with the computation of a inappropriate predictor.

In-sample predictors by models

pred.type	sem (mixed)	lag (mixed)	sac (mixed)

“trend”	yes	yes	yes
“TS”	yes	yes	no
“TC”	no	yes	yes*
“BP”	no	yes	yes*

Note that only “trend” and “TC” are available for prevision.

Out-of-sample predictors by models

pred.type	sem (mixed)	lag (mixed)	sac (mixed)

“trend”	yes	yes	yes
“TS”	yes	yes	no
“TS1” or “KP4”	no	yes	yes
“TC”	no	yes	yes*
“TC1” or “KP1”	yes	yes	yes
“BP”	no	yes	yes*
“BP1”	no	yes	yes*
“BPW”	no	yes	yes*
“BPW1”	no	yes	yes*
“BN”	no	yes	yes*
“BPN1”	no	yes	yes*
“KP2”	yes	yes	yes
“KP3”	yes	yes	yes
“KP5”	yes	no	yes*

Values for pred.type= include “TS1”, “TC”, “TC1”, “BP”, “BP1”, “BPW”, “BPW1”, “BPN”, “BPN1”, following the notation in Thomas-Agnan et al. (2015), and for pred.type= “KP1”, “KP2”, “KP3”, “KP4”, “KP5”, following the notation in Kelejian et al. (2007). pred.type="TS" is described bellow and in Bivand (2002).

In the following, the trend is the non-spatial smooth, the signal is the spatial smooth, and the noise is the residual. The fit returned by pred.type="TS" is the sum of the trend and the signal.

When pred.type="TS", the function approaches prediction first by dividing invocations between those with or without newdata. When no newdata is present, the response variable may be reconstructed as the sum of the trend, the signal, and the noise (residuals). Since the values of the response variable are known, their spatial lags are used to calculate signal components (Cressie 1993, p. 564). For the error model, trend = X beta, and signal = lambda W y - lambda W X beta. For the lag and mixed models, trend = X beta, and signal = rho W y.

This approach differs from the design choices made in other software, for example GeoDa, which does not use observations of the response variable, and corresponds to the newdata situation described below.

When however newdata is used for prediction, no observations of the response variable being predicted are available. Consequently, while the trend components are the same, the signal cannot take full account of the spatial smooth. In the error model and Durbin error model, the signal is set to zero, since the spatial smooth is expressed in terms of the error: inv(I - lambda W) e.

In the lag model, the signal can be expressed in the following way (for legacy=TRUE):

(I - rho W) y = X beta + e

y = inv(I - rho W) X beta + inv(I - rho W) e

giving a feasible signal component of:

rho W y = rho W inv(I - rho W) X beta

For legacy=FALSE, the trend is computed first as:

X beta

next the prediction using the DGP:

inv(I - rho W) X beta

and the signal is found as the difference between prediction and trend. The numerical results for the legacy and DGP methods are identical.

setting the error term to zero. This also means that predictions of the signal component for lag and mixed models require the inversion of an n-by-n matrix.

Because the outcomes of the spatial smooth on the error term are unobservable, this means that the signal values for newdata are incomplete. In the mixed model, the spatially lagged RHS variables influence both the trend and the signal, so that the root mean square prediction error in the examples below for this case with newdata is smallest, although the model was not the best fit.

If newdata has more than one row, leave-one-out predictors (pred.type= include “TS1”, “TC1”, “BP1”, “BPW1”, “BPN1”, “KP1”, “KP2”, “KP3”, “KP4”, “KP5”) are computed separatly on each out-of-sample unit.

listw should be provided except if newdata=NULL and pred.type= include “TS”, “trend”, or if newdata is not NULL, pred.type="trend" and object is not a mixed model.

all.data is useful when some out-of-sample predictors return different predictions for in-sample units, than the same predictor type computed only on in-sample data.

predict.sarlm() returns a vector of predictions with three attribute vectors of trend, signal (only for pred.type="TS") and region.id values and two other attributes of pred.type and call with class sarlm.pred.

print.sarlm.pred() is a print function for this class, printing and returning a data frame with columns: "fit", "trend" and "signal" (when available) and with region.id as row names.

Roger Bivand Roger.Bivand@nhh.no and Martin Gubri

Haining, R. 1990 Spatial data analysis in the social and environmental sciences, Cambridge: Cambridge University Press, p. 258; Cressie, N. A. C. 1993 Statistics for spatial data, Wiley, New York; Thomas-Agnan, C., Laurent, T. and Goulard, M. 2015 About predictions in spatial autoregressive models: Optimal and almost optimal strategies, TSE Working Paper, n. 13-452; Kelejian, H. H. and Prucha, I. R. 2007 The relative efficiencies of various predictors in spatial econometric models containing spatial lags, Regional Science and Urban Economics, Volume 37, Issue 3, 363–374; Bivand, R. 2002 Spatial econometrics functions in R: Classes and methods, Journal of Geographical Systems, Volume 4, No. 4, 405–421

errorsarlm, lagsarlm, sacsarlm

data(oldcol)
lw <- nb2listw(COL.nb)
COL.lag.eig <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, lw)

COL.mix.eig <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, lw,
  type="mixed")
print(p1 <- predict(COL.mix.eig))
print(p2 <- predict(COL.mix.eig, newdata=COL.OLD, listw=lw, pred.type = "TS",
 legacy.mixed = TRUE))
AIC(COL.mix.eig)
sqrt(deviance(COL.mix.eig)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME - as.vector(p1))^2)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME - as.vector(p2))^2)/length(COL.nb))

COL.err.eig <- errorsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, lw)
AIC(COL.err.eig)
sqrt(deviance(COL.err.eig)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME - as.vector(predict(COL.err.eig)))^2)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME - as.vector(predict(COL.err.eig, newdata=COL.OLD,
  listw=lw, pred.type = "TS")))^2)/length(COL.nb))

COL.SDerr.eig <- errorsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, lw,
 etype="emixed")
AIC(COL.SDerr.eig)
sqrt(deviance(COL.SDerr.eig)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME - as.vector(predict(COL.SDerr.eig)))^2)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME - as.vector(predict(COL.SDerr.eig, newdata=COL.OLD,
  listw=lw, pred.type = "TS")))^2)/length(COL.nb))

AIC(COL.lag.eig)
sqrt(deviance(COL.lag.eig)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME - as.vector(predict(COL.lag.eig)))^2)/length(COL.nb))
sqrt(sum((COL.OLD$CRIME - as.vector(predict(COL.lag.eig, newdata=COL.OLD,
  listw=lw, pred.type = "TS")))^2)/length(COL.nb))

p3 <- predict(COL.mix.eig, newdata=COL.OLD, listw=lw, pred.type = "TS",
 legacy=FALSE, legacy.mixed = TRUE)
all.equal(p2, p3, check.attributes=FALSE)
p4 <- predict(COL.mix.eig, newdata=COL.OLD, listw=lw, pred.type = "TS",
 legacy=FALSE, power=TRUE, legacy.mixed = TRUE)
all.equal(p2, p4, check.attributes=FALSE)
p5 <- predict(COL.mix.eig, newdata=COL.OLD, listw=lw, pred.type = "TS",
 legacy=TRUE, power=TRUE, legacy.mixed = TRUE)
all.equal(p2, p5, check.attributes=FALSE)

COL.SLX <- lmSLX(CRIME ~ INC + HOVAL, data=COL.OLD, listw=lw)
pslx0 <- predict(COL.SLX)
pslx1 <- predict(COL.SLX, newdata=COL.OLD, listw=lw)
all.equal(pslx0, pslx1)
COL.OLD1 <- COL.OLD
COL.OLD1$INC <- COL.OLD1$INC + 1
pslx2 <- predict(COL.SLX, newdata=COL.OLD1, listw=lw)
sum(coef(COL.SLX)[c(2,4)])
mean(pslx2-pslx1)

Loading required package: sp
Loading required package: Matrix
           fit      trend    signal
1001 26.044310 14.8543497 11.189961
1002 44.034234 29.2632100 14.771024
1003 43.511934 25.8193802 17.692554
1004 37.656561 16.4555568 21.201004
1005 10.902976  0.3664058 10.536570
1006 36.829798 24.2905235 12.539275
1007 44.290467 27.0386601 17.251807
1008 38.853571 21.5342382 17.319333
1009 50.870854 29.5092767 21.361578
1010 16.401300  5.6029095 10.798390
1011 36.354390 28.6415347  7.712855
1012 20.452836 12.4607271  7.992109
1013 20.324088 14.4173430  5.906745
1014 19.243496 10.2606415  8.982854
1015 19.747775 12.2556858  7.492089
1016  6.962528 -2.0137496  8.976277
1017  7.452143 -6.3808936 13.833037
1018 28.481587 14.2125583 14.269029
1019 43.351392 28.0442053 15.307187
1020 50.359682 30.6608138 19.698868
1021 38.905226 24.7490966 14.156130
1022 44.724478 28.8314288 15.893049
1023 37.888974 23.7778853 14.111088
1024 45.527017 26.9163177 18.610700
1025 32.429571 17.1892390 15.240332
1026 26.490842 14.8893969 11.601445
1027 35.629157 23.4577198 12.171438
1028 35.574326 21.9000995 13.674227
1029 38.598639 23.1818430 15.416796
1030 36.602053 14.8614059 21.740647
1031 50.320031 30.1013967 20.218634
1032 53.698863 31.2094153 22.489448
1033 49.364208 26.5151187 22.849090
1034 46.262357 25.5538213 20.708536
1035 39.177121 15.7689315 23.408190
1036 54.984344 32.6590826 22.325261
1037 51.611458 33.1290188 18.482439
1038 51.998831 30.7428298 21.256001
1039 43.651605 27.4107866 16.240818
1040 44.196841 25.9409237 18.255917
1041 49.310593 29.5106484 19.799944
1042 37.995310 15.7039010 22.291409
1043 46.908709 28.2687591 18.639950
1044 28.976789 19.5389213  9.437868
1045 25.343792 17.1838190  8.159973
1046 24.006252 16.0103695  7.995883
1047 25.034907 18.4616466  6.573260
1048 10.478529  3.3573570  7.121172
1049 13.495623  5.3356494  8.159973
           fit      trend    signal
1001 29.038788 14.8543497 14.184438
1002 46.227075 29.2632100 16.963865
1003 45.640479 25.8193802 19.821099
1004 36.643520 16.4555568 20.187963
1005 14.819940  0.3664058 14.453534
1006 38.764777 24.2905235 14.474253
1007 45.715716 27.0386601 18.677056
1008 37.514611 21.5342382 15.980373
1009 49.324228 29.5092767 19.814951
1010 17.510606  5.6029095 11.907697
1011 34.973607 28.6415347  6.332073
1012 21.079100 12.4607271  8.618373
1013 19.704134 14.4173430  5.286791
1014 16.365521 10.2606415  6.104879
1015 17.063856 12.2556858  4.808170
1016  6.190282 -2.0137496  8.204032
1017  5.967260 -6.3808936 12.348154
1018 29.250462 14.2125583 15.037903
1019 41.530036 28.0442053 13.485831
1020 49.344769 30.6608138 18.683956
1021 39.508818 24.7490966 14.759722
1022 42.772692 28.8314288 13.941263
1023 37.114901 23.7778853 13.337016
1024 43.622499 26.9163177 16.706182
1025 33.247197 17.1892390 16.057958
1026 30.301331 14.8893969 15.411934
1027 38.316063 23.4577198 14.858343
1028 36.886068 21.9000995 14.985968
1029 38.970564 23.1818430 15.788721
1030 33.014615 14.8614059 18.153209
1031 48.209875 30.1013967 18.108478
1032 50.808064 31.2094153 19.598648
1033 44.555996 26.5151187 18.040877
1034 43.232773 25.5538213 17.678952
1035 35.009061 15.7689315 19.240129
1036 52.113364 32.6590826 19.454282
1037 52.189015 33.1290188 19.059996
1038 51.631804 30.7428298 20.888975
1039 46.543564 27.4107866 19.132778
1040 45.036095 25.9409237 19.095171
1041 45.907835 29.5106484 16.397187
1042 35.337110 15.7039010 19.633209
1043 43.948398 28.2687591 15.679639
1044 32.091257 19.5389213 12.552335
1045 29.647005 17.1838190 12.463186
1046 26.375304 16.0103695 10.364935
1047 27.235807 18.4616466  8.774161
1048 14.785518  3.3573570 11.428161
1049 17.798836  5.3356494 12.463186
[1] 376.787
[1] 9.580773
[1] 9.580773
[1] 10.35029
[1] 376.7609
[1] 9.776221
[1] 9.776221
[1] 11.61744
[1] 377.1693
[1] 9.619298
[1] 9.619298
[1] 10.40472
Warning message:
In predict.sarlm(COL.SDerr.eig, newdata = COL.OLD, listw = lw, pred.type = "TS") :
  only legacy.mixed=TRUE is supported for pred.type='TS' and mixed models. legacy.mixed is forced
[1] 374.7809
[1] 9.772129
[1] 9.772129
[1] 10.72654
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] -2.479902
[1] -2.479902