lagsarlm: Spatial simultaneous autoregressive lag model estimation In spdep: Spatial Dependence: Weighting Schemes, Statistics and Models

Description

The `lagsarlm` function provides Maximum likelihood estimation of spatial simultaneous autoregressive lag and spatial Durbin (mixed) models of the form:

y = rho W y + X beta + e

where rho is found by `optimize()` first, and beta and other parameters by generalized least squares subsequently (one-dimensional search using optim performs badly on some platforms). In the spatial Durbin (mixed) model, the spatially lagged independent variables are added to X. Note that interpretation of the fitted coefficients should use impact measures, because of the feedback loops induced by the data generation process for this model. With one of the sparse matrix methods, larger numbers of observations can be handled, but the `interval=` argument may need be set when the weights are not row-standardised.

The `spBreg_lag` function is an early-release version of the Matlab Spatial Econometrics Toolbox function `sar_g.m`, using drawing by inversion, and not accommodating heteroskedastic disturbances.

Usage

 ```1 2 3 4 5 6``` ```lagsarlm(formula, data = list(), listw, na.action, Durbin, type, method="eigen", quiet=NULL, zero.policy=NULL, interval=NULL, tol.solve=1.0e-10, trs=NULL, control=list()) spBreg_lag(formula, data = list(), listw, na.action, Durbin, type, zero.policy=NULL, control=list()) ```

Arguments

 `formula` a symbolic description of the model to be fit. The details of model specification are given for `lm()` `data` an optional data frame containing the variables in the model. By default the variables are taken from the environment which the function is called. `listw` a `listw` object created for example by `nb2listw` `na.action` a function (default `options("na.action")`), can also be `na.omit` or `na.exclude` with consequences for residuals and fitted values - in these cases the weights list will be subsetted to remove NAs in the data. It may be necessary to set zero.policy to TRUE because this subsetting may create no-neighbour observations. Note that only weights lists created without using the glist argument to `nb2listw` may be subsetted. `Durbin` default FALSE (spatial lag model); if TRUE, full spatial Durbin model; if a formula object, the subset of explanatory variables to lag `type` (use the ‘Durbin=’ argument - retained for backwards compatibility only) default "lag", may be set to "mixed"; when "mixed", the lagged intercept is dropped for spatial weights style "W", that is row-standardised weights, but otherwise included; “Durbin” may be used instead of “mixed” `method` "eigen" (default) - the Jacobian is computed as the product of (1 - rho*eigenvalue) using `eigenw`, and "spam" or "Matrix_J" for strictly symmetric weights lists of styles "B" and "C", or made symmetric by similarity (Ord, 1975, Appendix C) if possible for styles "W" and "S", using code from the spam or Matrix packages to calculate the determinant; “Matrix” and “spam_update” provide updating Cholesky decomposition methods; "LU" provides an alternative sparse matrix decomposition approach. In addition, there are "Chebyshev" and Monte Carlo "MC" approximate log-determinant methods; the Smirnov/Anselin (2009) trace approximation is available as "moments". Three methods: "SE_classic", "SE_whichMin", and "SE_interp" are provided experimentally, the first to attempt to emulate the behaviour of Spatial Econometrics toolbox ML fitting functions. All use grids of log determinant values, and the latter two attempt to ameliorate some features of "SE_classic". `quiet` default NULL, use !verbose global option value; if FALSE, reports function values during optimization. `zero.policy` default NULL, use global option value; if TRUE assign zero to the lagged value of zones without neighbours, if FALSE (default) assign NA - causing `lagsarlm()` to terminate with an error `interval` default is NULL, search interval for autoregressive parameter `tol.solve` the tolerance for detecting linear dependencies in the columns of matrices to be inverted - passed to `solve()` (default=1.0e-10). This may be used if necessary to extract coefficient standard errors (for instance lowering to 1e-12), but errors in `solve()` may constitute indications of poorly scaled variables: if the variables have scales differing much from the autoregressive coefficient, the values in this matrix may be very different in scale, and inverting such a matrix is analytically possible by definition, but numerically unstable; rescaling the RHS variables alleviates this better than setting tol.solve to a very small value `trs` default NULL, if given, a vector of powered spatial weights matrix traces output by `trW`; when given, insert the asymptotic analytical values into the numerical Hessian instead of the approximated values; may be used to get around some problems raised when the numerical Hessian is poorly conditioned, generating NaNs in subsequent operations; the use of trs is recommended `control` list of extra control arguments - see section below

Details

The asymptotic standard error of rho is only computed when method=eigen, because the full matrix operations involved would be costly for large n typically associated with the choice of method="spam" or "Matrix". The same applies to the coefficient covariance matrix. Taken as the asymptotic matrix from the literature, it is typically badly scaled, and with the elements involving rho being very small, while other parts of the matrix can be very large (often many orders of magnitude in difference). It often happens that the `tol.solve` argument needs to be set to a smaller value than the default, or the RHS variables can be centred or reduced in range.

Versions of the package from 0.4-38 include numerical Hessian values where asymptotic standard errors are not available. This change has been introduced to permit the simulation of distributions for impact measures. The warnings made above with regard to variable scaling also apply in this case.

Note that the fitted() function for the output object assumes that the response variable may be reconstructed as the sum of the trend, the signal, and the noise (residuals). Since the values of the response variable are known, their spatial lags are used to calculate signal components (Cressie 1993, p. 564). This differs from other software, including GeoDa, which does not use knowledge of the response variable in making predictions for the fitting data.

Value

A list object of class `sarlm`

 `type` "lag" or "mixed" `dvars` vector of length 2, numbers of columns in X and WX; if Durbin is given as a formula, the formula as attribute “f”, the indices of the included Wx as “inds”, and indices of added zero Wx coefficients as “zero_fill” `rho` simultaneous autoregressive lag coefficient `coefficients` GLS coefficient estimates `rest.se` asymptotic standard errors if ase=TRUE, otherwise approximate numeriacal Hessian-based values `LL` log likelihood value at computed optimum `s2` GLS residual variance `SSE` sum of squared GLS errors `parameters` number of parameters estimated
 `logLik_lm.model` Log likelihood of the linear model for rho=0 `AIC_lm.model` AIC of the linear model for rho=0 `method` the method used to calculate the Jacobian `call` the call used to create this object `residuals` GLS residuals `tarX` model matrix of the GLS model `tary` response of the GLS model `y` response of the linear model for rho=0 `X` model matrix of the linear model for rho=0
 `opt` object returned from numerical optimisation `fitted.values` Difference between residuals and response variable `se.fit` Not used yet
 `ase` TRUE if method=eigen `rho.se` if ase=TRUE, the asymptotic standard error of rho, otherwise approximate numeriacal Hessian-based value `LMtest` if ase=TRUE, the Lagrange Multiplier test for the absence of spatial autocorrelation in the lag model residuals `resvar` the asymptotic coefficient covariance matrix for (s2, rho, B) `zero.policy` zero.policy for this model `aliased` the aliased explanatory variables (if any) `listw_style` the style of the spatial weights used `interval` the line search interval used to find rho `fdHess` the numerical Hessian-based coefficient covariance matrix for (rho, B) if computed `optimHess` if TRUE and fdHess returned, `optim` used to calculate Hessian at optimum `insert` if TRUE and fdHess returned, the asymptotic analytical values are inserted into the numerical Hessian instead of the approximated values, and its size increased to include the first row/column for sigma2 `LLNullLlm` Log-likelihood of the null linear model `timings` processing timings `f_calls` number of calls to the log likelihood function during optimization `hf_calls` number of calls to the log likelihood function during numerical Hessian computation `intern_classic` a data frame of detval matrix row choices used by the SE toolbox classic method `na.action` (possibly) named vector of excluded or omitted observations if non-default na.action argument used

The internal sar.lag.mixed.* functions return the value of the log likelihood function at rho.

Control arguments

tol.opt:

the desired accuracy of the optimization - passed to `optimize()` (default=square root of double precision machine tolerance, a larger root may be used needed, see help(boston) for an example)

fdHess:

default NULL, then set to (method != "eigen") internally; use `fdHess` to compute an approximate Hessian using finite differences when using sparse matrix methods; used to make a coefficient covariance matrix when the number of observations is large; may be turned off to save resources if need be

optimHess:

default FALSE, use `fdHess` from nlme, if TRUE, use `optim` to calculate Hessian at optimum

optimHessMethod:

default “optimHess”, may be “nlm” or one of the `optim` methods

compiled_sse:

default FALSE; logical value used in the log likelihood function to choose compiled code for computing SSE

Imult:

default 2; used for preparing the Cholesky decompositions for updating in the Jacobian function

super:

if NULL (default), set to FALSE to use a simplicial decomposition for the sparse Cholesky decomposition and method “Matrix_J”, set to `as.logical(NA)` for method “Matrix”, if TRUE, use a supernodal decomposition

cheb_q:

default 5; highest power of the approximating polynomial for the Chebyshev approximation

MC_p:

default 16; number of random variates

MC_m:

default 30; number of products of random variates matrix and spatial weights matrix

spamPivot:

default “MMD”, alternative “RCM”

in_coef

default 0.1, coefficient value for initial Cholesky decomposition in “spam_update”

type

default “MC”, used with method “moments”; alternatives “mult” and “moments”, for use if `trs` is missing, `trW`

correct

default TRUE, used with method “moments” to compute the Smirnov/Anselin correction term

trunc

default TRUE, used with method “moments” to truncate the Smirnov/Anselin correction term

SE_method

default “LU”, may be “MC”

nrho

default 200, as in SE toolbox; the size of the first stage lndet grid; it may be reduced to for example 40

interpn

default 2000, as in SE toolbox; the size of the second stage lndet grid

small_asy

default TRUE; if the method is not “eigen”, use asymmetric covariances rather than numerical Hessian ones if n <= small

small

default 1500; threshold number of observations for asymmetric covariances when the method is not “eigen”

SElndet

default NULL, may be used to pass a pre-computed SE toolbox style matrix of coefficients and their lndet values to the "SE_classic" and "SE_whichMin" methods

LU_order

default FALSE; used in “LU_prepermutate”, note warnings given for `lu` method

pre_eig

default NULL; may be used to pass a pre-computed vector of eigenvalues

OrdVsign

default 1; used to set the sign of the final component to negative if -1 (alpha times ((sigma squared) squared) in Ord (1975) equation B.1).

Extra Bayesian control arguments

ldet_method

default “SE_classic”; equivalent to the `method` argument in `lagsarlm`

interval

default `c(-1, 1)`; used unmodified or set internally by `jacobianSetup`

ndraw

default `2500L`; integer total number of draws

nomit

default `500L`; integer total number of omitted burn-in draws

thin

default `1L`; integer thinning proportion

verbose

default `FALSE`; inverse of `quiet` argument in `lagsarlm`

detval

default `NULL`; not yet in use, precomputed matrix of log determinants

prior

a list with the following components:

rhoMH

default FALSE; use Metropolis or griddy Gibbs

Tbeta

default `NULL`; values of the betas variance-covariance matrix, set to `diag(k)*1e+12` if `NULL`

c_beta

default `NULL`; values of the betas set to 0 if `NULL`

rho

default `0.5`; value of the autoregressive coefficient

sige

default `1`; value of the residual variance

nu

default `0`; informative Gamma(nu,d0) prior on sige

d0

default `0`; informative Gamma(nu,d0) prior on sige

a1

default `1.01`; parameter for beta(a1,a2) prior on rho

a2

default `1.01`; parameter for beta(a1,a2) prior on rho

Author(s)

Roger Bivand [email protected], with thanks to Andrew Bernat for contributions to the asymptotic standard error code.

References

Cliff, A. D., Ord, J. K. 1981 Spatial processes, Pion; Ord, J. K. 1975 Estimation methods for models of spatial interaction, Journal of the American Statistical Association, 70, 120-126; Anselin, L. 1988 Spatial econometrics: methods and models. (Dordrecht: Kluwer); Anselin, L. 1995 SpaceStat, a software program for the analysis of spatial data, version 1.80. Regional Research Institute, West Virginia University, Morgantown, WV; Anselin L, Bera AK (1998) Spatial dependence in linear regression models with an introduction to spatial econometrics. In: Ullah A, Giles DEA (eds) Handbook of applied economic statistics. Marcel Dekker, New York, pp. 237-289; Cressie, N. A. C. 1993 Statistics for spatial data, Wiley, New York; LeSage J and RK Pace (2009) Introduction to Spatial Econometrics. CRC Press, Boca Raton.

Roger Bivand, Gianfranco Piras (2015). Comparing Implementations of Estimation Methods for Spatial Econometrics. Journal of Statistical Software, 63(18), 1-36. https://www.jstatsoft.org/v63/i18/.

Bivand, R. S., Hauke, J., and Kossowski, T. (2013). Computing the Jacobian in Gaussian spatial autoregressive models: An illustrated comparison of available methods. Geographical Analysis, 45(2), 150-179.

`lm`, `errorsarlm`, `summary.sarlm`, `eigenw`, `predict.sarlm`, `impacts.sarlm`, `residuals.sarlm`, `do_ldet`

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140``` ```data(oldcol) listw <- nb2listw(COL.nb, style="W") ev <- eigenw(listw) W <- as(listw, "CsparseMatrix") trMatc <- trW(W, type="mult") COL.lag.eig <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, listw=listw, method="eigen", quiet=FALSE, control=list(pre_eig=ev, OrdVsign=1)) summary(COL.lag.eig, correlation=TRUE) ## Not run: COL.lag.eig\$fdHess COL.lag.eig\$resvar # using the apparent sign in Ord (1975, equation B.1) COL.lag.eigb <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, listw=listw, method="eigen", control=list(pre_eig=ev, OrdVsign=-1)) summary(COL.lag.eigb) COL.lag.eigb\$fdHess COL.lag.eigb\$resvar # force numerical Hessian COL.lag.eig1 <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, listw=listw, method="Matrix", control=list(small=25)) summary(COL.lag.eig1) COL.lag.eig1\$fdHess # force LeSage & Pace (2008, p. 57) approximation COL.lag.eig1a <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, listw=listw, method="Matrix", control=list(small=25), trs=trMatc) summary(COL.lag.eig1a) COL.lag.eig1a\$fdHess COL.lag.eig\$resvar[2,2] # using the apparent sign in Ord (1975, equation B.1) COL.lag.eigb\$resvar[2,2] # force numerical Hessian COL.lag.eig1\$fdHess[1,1] # force LeSage & Pace (2008, p. 57) approximation COL.lag.eig1a\$fdHess[2,2] ## End(Not run) system.time(COL.lag.M <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, nb2listw(COL.nb), method="Matrix", quiet=FALSE)) summary(COL.lag.M) impacts(COL.lag.M, listw=nb2listw(COL.nb)) ## Not run: system.time(COL.lag.sp <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, nb2listw(COL.nb), method="spam", quiet=FALSE)) summary(COL.lag.sp) ## End(Not run) COL.lag.B <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, nb2listw(COL.nb, style="B")) summary(COL.lag.B) COL.mixed.B <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, nb2listw(COL.nb, style="B"), type="mixed", tol.solve=1e-9, control=list(pre_eig=ev)) summary(COL.mixed.B) COL.mixed.W <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, listw, type="mixed", control=list(pre_eig=ev)) summary(COL.mixed.W) COL.mixed.D00 <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, listw, Durbin=TRUE, control=list(pre_eig=ev)) summary(COL.mixed.D00) COL.mixed.D01 <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, listw, Durbin=FALSE, control=list(pre_eig=ev)) summary(COL.mixed.D01) COL.mixed.D1 <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, listw, Durbin= ~ INC + HOVAL, control=list(pre_eig=ev)) summary(COL.mixed.D1) f <- CRIME ~ INC + HOVAL COL.mixed.D2 <- lagsarlm(f, data=COL.OLD, listw, Durbin=as.formula(delete.response(terms(f))), control=list(pre_eig=ev)) summary(COL.mixed.D2) COL.mixed.D1a <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, listw, Durbin= ~ INC, control=list(pre_eig=ev)) summary(COL.mixed.D1a) try(COL.mixed.D1 <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, listw, Durbin= ~ inc + HOVAL, control=list(pre_eig=ev))) try(COL.mixed.D1 <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, listw, Durbin= ~ DISCBD + HOVAL, control=list(pre_eig=ev))) NA.COL.OLD <- COL.OLD NA.COL.OLD\$CRIME[20:25] <- NA COL.lag.NA <- lagsarlm(CRIME ~ INC + HOVAL, data=NA.COL.OLD, nb2listw(COL.nb), na.action=na.exclude, control=list(tol.opt=.Machine\$double.eps^0.4)) COL.lag.NA\$na.action COL.lag.NA resid(COL.lag.NA) ## Not run: data(boston, package="spData") gp2mM <- lagsarlm(log(CMEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) + AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT), data=boston.c, nb2listw(boston.soi), type="mixed", method="Matrix") summary(gp2mM) W <- as(nb2listw(boston.soi), "CsparseMatrix") trMatb <- trW(W, type="mult") gp2mMi <- lagsarlm(log(CMEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) + AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT), data=boston.c, nb2listw(boston.soi), type="mixed", method="Matrix", trs=trMatb) summary(gp2mMi) ## End(Not run) ## Not run: set.seed(1) COL.lag.Bayes <- spBreg_lag(CRIME ~ INC + HOVAL, data=COL.OLD, listw=listw) summary(COL.lag.Bayes) summary(impacts(COL.lag.Bayes, tr=trMatc), short=TRUE, zstats=TRUE) summary(impacts(COL.lag.Bayes, evalues=ev), short=TRUE, zstats=TRUE) set.seed(1) COL.D0.Bayes <- spBreg_lag(CRIME ~ INC + HOVAL, data=COL.OLD, listw=listw, Durbin=TRUE) summary(COL.D0.Bayes) summary(impacts(COL.D0.Bayes, tr=trMatc), short=TRUE, zstats=TRUE) ## End(Not run) set.seed(1) COL.D1.Bayes <- spBreg_lag(CRIME ~ DISCBD + INC + HOVAL, data=COL.OLD, listw=listw, Durbin= ~ INC) summary(COL.D1.Bayes) summary(impacts(COL.D1.Bayes, tr=trMatc), short=TRUE, zstats=TRUE) ## Not run: data(elect80, package="spData") lw <- nb2listw(e80_queen, zero.policy=TRUE) el_ml <- lagsarlm(log(pc_turnout) ~ log(pc_college) + log(pc_homeownership) + log(pc_income), data=elect80, listw=lw, zero.policy=TRUE, method="LU") summary(el_ml) set.seed(1) el_B <- spBreg_lag(log(pc_turnout) ~ log(pc_college) + log(pc_homeownership) + log(pc_income), data=elect80, listw=lw, zero.policy=TRUE) summary(el_B) el_ml\$timings attr(el_B, "timings") ## End(Not run) ```

spdep documentation built on Nov. 21, 2018, 5:05 p.m.