knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The spfilteR
package provides tools to implement the eigenvector-based spatial filtering (ESF) approach put forth by @griffith2003 in linear and generalized linear regression models. It allows users to obtain eigenvectors from a transformed connectivity matrix and supports the supervised and unsupervised creation of a synthetic spatial filter. For the unsupervised approach, eigenvectors can be selected based on either i) the maximization of model fit, ii) the minimization of residual autocorrelation, iii) the statistical significance of residual autocorrelation, or iv) the significance level of candidate eigenvectors. Alternatively, all eigenvectors in the search set may be included so that no selection takes place. While the functions provide a high flexibility, only a minimum of code is required to implement the unsupervised ESF approach.
This vignette solely focuses on the workflow to illustrate the functionality of the spfilteR
package. For a discussion on the ESF methodology, interested readers may be referred to other sources, e.g., @griffith2019 or @tiefelsdorf2007.
A stable release version of the spfilteR
package is available on CRAN and the latest development version can be downloaded from GitHub.
# CRAN release install.packages("spfilteR") # GitHub library(devtools) devtools::install_github("sjuhl/spfilteR")
Together with the main functions, the package contains an artificial dataset (fakedataset
) and a connectivity matrix (W
) connecting the 100 cross-sections on a regular grid by using the rook criterion of adjacency. For illustrative purposes, the following examples utilize the fake dataset.
# load the package library(spfilteR) # load the dataset data("fakedata") # take a look at the connectivity matrix and the variables dim(W) head(fakedataset)
Besides an ID
variable, the dataset contains a binary indicator
variable, two count variables, and four continuous variables.
In a first step, the function MI.vec()
checks for the presence of spatial autocorrelation in the continuous variables by means of Moran's I [see also @cliff1981; @cliff1972]. The function further allows users to define the alternative hypothesis in order to obtain p-values.
# select continuous variables cont <- cbind(fakedataset$x1, fakedataset$x2, fakedataset$x3, fakedataset$negative) colnames(cont) <- c("x1", "x2", "x3", "negative") # Moran test of spatial autocorrelation (I <- MI.vec(x = cont, W = W, alternative = 'greater'))
The output suggests that the variables x1
and x2
are positively autocorrelated at conventional levels of statistical significance. Moreover, the standardized value of Moran's I (zI
) indicates that the variable negative
is negatively autocorrelated. We can use the function MI.vec()
and specify alternative = 'lower'
to assess the significance of the negative autocorrelation:
MI.vec(x = fakedataset$negative, W = W, alternative = 'lower')
Since the Moran coefficient is a global measure of spatial autocorrelation, spatial heterogeneity constitutes a problem for this statistic. More specifically, the simultaneous presence of positive and negative spatial autocorrelation at different scales cannot be revealed by the classical Moran's I. To circumvent this problem, the function MI.decomp()
decomposes the Moran coefficient into a positively and a negatively autocorrelated part and performs a permutation procedure to assess the significance [@dary2011]:
# decompose Moran's I (I.dec <- MI.decomp(x = cont, W = W, nsim=100))
Note that the global Moran's I is the sum of I+
and I-
:
# I = 'I+' + 'I-' cbind(I[, "I"], I.dec[, "I+"] + I.dec[, "I-"])
Assume that we wish to regress x1
on x2
and test for residual autocorrelation using the function MI.resid()
.
# define variables y <- fakedataset$x1 X <- cbind(1, fakedataset$x2) # OLS regression ols.resid <- resid(lm(y ~ X)) # Moran test of residual spatial autocorrelation MI.resid(resid = ols.resid, W = W, alternative = 'greater')
The results show that the residuals are significantly autocorrelated which violates the assumption of uncorrelated errors. In order to resolve this problem of spatial autocorrelation in regression residuals, the function lmFilter()
estimates a spatially filtered linear regression model using an unsupervised stepwise regression to identify relevant eigenvectors derived from the transformed connectivity matrix. Below, the unsupervised eigenvector search algorithm selects eigenvectors based on the reduction in residual autocorrelation. The output is a class spfilter
object.
# ESF model (lm.esf <- lmFilter(y = y, x = X, W = W, objfn = 'MI', positive = TRUE, ideal.setsize = TRUE, tol = .2)) summary(lm.esf, EV = TRUE)
While the print
method shows that 8 eigenvectors were selected from the candidate set consisting of 22 eigenvectors, the summary
method provides additional information. Besides the ordinary least squares (OLS) parameter estimates, the output shows that the ESF model filters for positive spatial autocorrelation using the minimization of residual autocorrelation as objective function during the eigenvector search. A comparison between the filtered and the nonspatial OLS model with respect to model fit and residual autocorrelation is also provided. Since the option EV
is set to TRUE
, the summary
method also displays information on the selected eigenvectors. As the results show, the ESF model successfully removes the spatial pattern from model residuals.
The plot
method allows for an easy visualization of the results. The graph displays the Moran coefficient associated with each of the eigenvectors. The shaded area signifies the candidate set and selected eigenvectors are illustrated by black dots.
plot(lm.esf)
Moreover, lmFilter()
also allows users to select eigenvectors based on alternative selection criteria:
### Alternative selection criteria: # maximization of model fit lmFilter(y = y, x = X, W = W, objfn = 'R2', positive = TRUE, ideal.setsize = TRUE) # significance of residual autocorrelation lmFilter(y = y, x = X, W = W, objfn = 'pMI', sig = .1, bonferroni = FALSE, positive = TRUE, ideal.setsize = TRUE) # significance of eigenvectors lmFilter(y = y, x = X, W = W, objfn = 'p', sig = .1, bonferroni = TRUE, positive = TRUE, ideal.setsize = TRUE) # all eigenvectors in the candidate set lmFilter(y = y, x = X, W = W, objfn = 'all', positive = TRUE, ideal.setsize = TRUE)
If users wish to select eigenvectors based on individual selection criteria, they can obtain the eigenvectors using the function getEVs()
and perform a supervised selection procedure using the basic stats::lm()
command.
The ESF approach outlined above easily extends to generalized linear models (GLMs) as well [see also @griffith2019]. Therefore, the spfilteR
package contains the function glmFilter()
which uses maximum likelihood estimation (MLE) to fit a spatially filtered GLM and performs an unsupervised eigenvector search based on alternative objective functions.
Except for minor differences, glmFilter()
works similar to the lmFilter()
function discussed above. The option model
defines the model that needs to be estimated. Currently, unsupervised spatial filtering is possible for logit, probit, and poisson models. Moreover, the option optim.method
specifies the optimization method used to maximize the log-likelihood function. Finally, resid.type
allows users to define the type of residuals used to calculate Moran's I and boot.MI
is an integer specifying the number of bootstrap permutations to obtain the variance of Moran's I.
# define outcome variables y.bin <- fakedataset$indicator y.count <- fakedataset$count # ESF logit model (logit.esf <- glmFilter(y = y.bin, x = NULL, W = W, objfn = 'p', model = 'logit', optim.method = 'BFGS', sig = .1, bonferroni = FALSE, positive = TRUE, ideal.setsize = FALSE, alpha = .25, resid.type = 'deviance', boot.MI = 100)) # ESF probit model (probit.esf <- glmFilter(y = y.bin, x = NULL, W = W, objfn = 'p', model = 'probit', optim.method = 'BFGS', sig = .1, bonferroni = FALSE, positive = TRUE, ideal.setsize = FALSE, alpha = .25, resid.type = 'deviance', boot.MI = 100)) # ESF poisson model (poisson.esf <- glmFilter(y = y.count, x = NULL, W = W, objfn = 'BIC', model = 'poisson', optim.method = 'BFGS', positive = TRUE, ideal.setsize = FALSE, alpha = .25, resid.type = 'deviance', boot.MI = 100))
Again, if users wish to define individual selection criteria or fit a GLM currently not implemented in glmFilter()
, they can obtain the eigenvectors using the getEVs()
command and perform supervised eigenvector selection using the standard stats::glm()
function.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.