knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
spfilteR package provides tools to implement the eigenvector-based spatial filtering (ESF) approach put forth by @griffith2003 in linear and generalized linear regression models. It allows users to obtain eigenvectors from a transformed connectivity matrix and supports the supervised and unsupervised creation of a synthetic spatial filter. For the unsupervised approach, eigenvectors can be selected based on either i) the maximization of model fit, ii) the minimization of residual autocorrelation, iii) the statistical significance of residual autocorrelation, or iv) the significance level of candidate eigenvectors. Alternatively, all eigenvectors in the search set may be included so that no selection takes place. While the functions provide a high flexibility, only a minimum of code is required to implement the unsupervised ESF approach.
This vignette solely focuses on the workflow to illustrate the functionality of the
spfilteR package. For a discussion on the ESF methodology, interested readers may be referred to other sources, e.g., @griffith2019 or @tiefelsdorf2007.
A stable release version of the
spfilteR package is available on CRAN and the latest development version can be downloaded from GitHub.
# CRAN release install.packages("spfilteR") # GitHub library(devtools) devtools::install_github("sjuhl/spfilteR")
Together with the main functions, the package contains an artificial dataset (
fakedataset) and a connectivity matrix (
W) connecting the 100 cross-sections on a regular grid by using the rook criterion of adjacency. For illustrative purposes, the following examples utilize the fake dataset.
# load the package library(spfilteR) # load the dataset data("fakedata") # take a look at the connectivity matrix and the variables dim(W) head(fakedataset)
ID variable, the dataset contains a binary
indicator variable, two count variables, and four continuous variables.
In a first step, the function
MI.vec() checks for the presence of spatial autocorrelation in the continuous variables by means of Moran's I [see also @cliff1981; @cliff1972]. The function further allows users to define the alternative hypothesis in order to obtain p-values.
# select continuous variables cont <- cbind(fakedataset$x1, fakedataset$x2, fakedataset$x3, fakedataset$negative) colnames(cont) <- c("x1", "x2", "x3", "negative") # Moran test of spatial autocorrelation (I <- MI.vec(x = cont, W = W, alternative = 'greater'))
The output suggests that the variables
x2 are positively autocorrelated at conventional levels of statistical significance. Moreover, the standardized value of Moran's I (
zI) indicates that the variable
negative is negatively autocorrelated. We can use the function
MI.vec() and specify
alternative = 'lower' to assess the significance of the negative autocorrelation:
MI.vec(x = fakedataset$negative, W = W, alternative = 'lower')
Since the Moran coefficient is a global measure of spatial autocorrelation, spatial heterogeneity constitutes a problem for this statistic. More specifically, the simultaneous presence of positive and negative spatial autocorrelation at different scales cannot be revealed by the classical Moran's I. To circumvent this problem, the function
MI.decomp() decomposes the Moran coefficient into a positively and a negatively autocorrelated part and performs a permutation procedure to assess the significance [@dary2011]:
# decompose Moran's I (I.dec <- MI.decomp(x = cont, W = W, nsim=100))
Note that the global Moran's I is the sum of
# I = 'I+' + 'I-' cbind(I[, "I"], I.dec[, "I+"] + I.dec[, "I-"])
Assume that we wish to regress
x2 and test for residual autocorrelation using the function
# define variables y <- fakedataset$x1 X <- cbind(1, fakedataset$x2) # OLS regression ols.resid <- resid(lm(y ~ X)) # Moran test of residual spatial autocorrelation MI.resid(resid = ols.resid, W = W, alternative = 'greater')
The results show that the residuals are significantly autocorrelated which violates the assumption of uncorrelated errors. In order to resolve this problem of spatial autocorrelation in regression residuals, the function
lmFilter() estimates a spatially filtered linear regression model using an unsupervised stepwise regression to identify relevant eigenvectors derived from the transformed connectivity matrix. Below, the unsupervised eigenvector search algorithm selects eigenvectors based on the reduction in residual autocorrelation. The output is a class
# ESF model (lm.esf <- lmFilter(y = y, x = X, W = W, objfn = 'MI', positive = TRUE, ideal.setsize = TRUE, tol = .2)) summary(lm.esf, EV = TRUE)
summary method provides additional information. Besides the ordinary least squares (OLS) parameter estimates, the output shows that the ESF model filters for positive spatial autocorrelation using the minimization of residual autocorrelation as objective function during the eigenvector search. A comparison between the filtered and the nonspatial OLS model with respect to model fit and residual autocorrelation is also provided. Since the option
EV is set to
summary method also displays information on the selected eigenvectors. As the results show, the ESF model successfully removes the spatial pattern from model residuals.
plot method allows for an easy visualization of the results. The graph displays the Moran coefficient associated with each of the eigenvectors. The shaded area signifies the candidate set and selected eigenvectors are illustrated by black dots.
lmFilter() also allows users to select eigenvectors based on alternative selection criteria:
### Alternative selection criteria: # maximization of model fit lmFilter(y = y, x = X, W = W, objfn = 'R2', positive = TRUE, ideal.setsize = TRUE) # significance of residual autocorrelation lmFilter(y = y, x = X, W = W, objfn = 'pMI', sig = .1, bonferroni = FALSE, positive = TRUE, ideal.setsize = TRUE) # significance of eigenvectors lmFilter(y = y, x = X, W = W, objfn = 'p', sig = .1, bonferroni = TRUE, positive = TRUE, ideal.setsize = TRUE) # all eigenvectors in the candidate set lmFilter(y = y, x = X, W = W, objfn = 'all', positive = TRUE, ideal.setsize = TRUE)
If users wish to select eigenvectors based on individual selection criteria, they can obtain the eigenvectors using the function
getEVs() and perform a supervised selection procedure using the basic
The ESF approach outlined above easily extends to generalized linear models (GLMs) as well [see also @griffith2019]. Therefore, the
spfilteR package contains the function
glmFilter() which uses maximum likelihood estimation (MLE) to fit a spatially filtered GLM and performs an unsupervised eigenvector search based on alternative objective functions.
Except for minor differences,
glmFilter() works similar to the
lmFilter() function discussed above. The option
model defines the model that needs to be estimated. Currently, unsupervised spatial filtering is possible for logit, probit, and poisson models. Moreover, the option
optim.method specifies the optimization method used to maximize the log-likelihood function. Finally,
resid.type allows users to define the type of residuals used to calculate Moran's I and
boot.MI is an integer specifying the number of bootstrap permutations to obtain the variance of Moran's I.
# define outcome variables y.bin <- fakedataset$indicator y.count <- fakedataset$count # ESF logit model (logit.esf <- glmFilter(y = y.bin, x = NULL, W = W, objfn = 'p', model = 'logit', optim.method = 'BFGS', sig = .1, bonferroni = FALSE, positive = TRUE, ideal.setsize = FALSE, alpha = .25, resid.type = 'deviance', boot.MI = 100)) # ESF probit model (probit.esf <- glmFilter(y = y.bin, x = NULL, W = W, objfn = 'p', model = 'probit', optim.method = 'BFGS', sig = .1, bonferroni = FALSE, positive = TRUE, ideal.setsize = FALSE, alpha = .25, resid.type = 'deviance', boot.MI = 100)) # ESF poisson model (poisson.esf <- glmFilter(y = y.count, x = NULL, W = W, objfn = 'BIC', model = 'poisson', optim.method = 'BFGS', positive = TRUE, ideal.setsize = FALSE, alpha = .25, resid.type = 'deviance', boot.MI = 100))
Again, if users wish to define individual selection criteria or fit a GLM currently not implemented in
glmFilter(), they can obtain the eigenvectors using the
getEVs() command and perform supervised eigenvector selection using the standard
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.