library(knitr) # options(width=90, dev='win.metafile', fig.fullwidth=TRUE) # when using word options(width=90, fig.fullwidth=TRUE) opts_chunk$set(comment=NA, cache=FALSE) #Load required libraries library(sp) library(spdep) library(spgwr) library(GWmodel) library(rgeos) library(ncf) library(lmtest) library(gdata) library(tidyverse) library(GGally) library(tmap) # Load SpatialPointsDataFrame spdf <- readRDS("C:/CloudStor/R_Stuff/SPM/spdf_13E.rds") #spdf <- readRDS("C:/CloudStor/R_Stuff/SPM/spdf_11C.rds") #spdf <- readRDS("C:/CloudStor/R_Stuff/SPM/spdf_06C.rds") ## Filter cells which have negligible fishing activity. Currently set at ## a minimum of 30 minutes across the six year dataset. spdf <- subset(spdf, minstotal > 30)
The local scale patchiness in habitat quality and environmental conditions combined with abalone biology are expected to generate some level of spatial autocorrelation and spatial structure in harvest data. Whether environmental drivers over ride fishing history as the major determinant of future catch has not been explored empirically. Spatial patterns in productivty and harvest are unlikely to be homogeneous, and may not be temporally stable. Of considerbale interest is both the level of heterogeneity within fishing grounds, and, whether prior fishing effort effects future harvest.
Regression methods provide an opportunity to explore the extent to which local catch can be explained by prior fishing history. Contrasting Ordinary Least Squares regression methods with two forms of spatial regression (spatial autoregressive models and geographic weighted regression), can highlight the extent to which accounting for spatial autocorrelation contribution of local spatial dynamics, and, utilising spatial information improvesto the empirical relationship between catch and fishing history at the local scale. Parametric analyses assume that samples and/or error terms are independent. With spatially derived datasets, this assumption is likely to be compromised and unless spatial autocorrelation is accounted for. When addressing spatial autocorrelation in datasets, the appropriate analytical approach is dependent on whether spatial variation is of inherent interest or considered a nuisance factor, and whether the autocorrelation affects either the dependent or independent variables, or both. In the case of spatial heterogeneity in the data where nearby observations are affected by (or effect) neighbouring observations, we would apply a Spatial Lag model (Ward and Gleditsch 2008). Spatial Lag models effectively treat spatial variation, as a component of interest. In the case of spatial dependence where data are linked to neighbouring data through an unspecified process, we apply a Spatial Error model. Spatial error models are often referred to as mis-specified models because there are factors of importance that have not been captured that affect the local value of data. For this reason, spatial error models treat spatial autocorrelation as a nuisance factor. Where the spatial influence is considered to act on both the independent and dependent variables, then a Spatial Durbin model is more appropriate. There seem few plausible arguments that the magnitude of catch level in one grid cell could have a direct affect in a neighbouring grid cell in the same year (i.e. Spatial Lag Model). More likely, catch in a grid cell is a locally correlated either because habitat allows similar levels of productivity, or, the local scale of targeted fishing drives high and low intensities of fishing pressure (i.e. Spatial Error Model, or Spatial Durbin Model). In this chapter we explore the application of spatial regression methods to quantify the relationship between prior fishing history and catch at the local scale. Geographic Weighted Regression (GWR) seeks to account for non-stationarity in the dataset using a form of moving window regression, and has some similarity to the approach used for Regression Kriging. GWR is a relatively new method in the regression family and has been suggested as a useful exploratory analysis tool, with some caution advised on interpretation of results.
This example is based on the hexagonal grid dataset. The example files included for this Minimum Working Example (MWE) are SpatialPolygonDataFrames, with years as columns side by side. A stacked version of this data is also available if required.
# Variable names names(spdf)
Available variables:
N.B. blkgtotal, daystotal and minstotal contain the sum across all years for each cell. For obvious reasons this is not done for divers.
Arranging the horizontal form of grid data with a column for each year results in NA where there is no activity in a cell in a given year. This NA causes problems with the Nearest Neighbour function and with the sqrt transform, and NA is repalced with zero as part of the data prep. I have also root transformed the data because it apeared to be a better fit, but there is likely to be a better strategy and happy to be advised this is not sensible. A methodically look at the statistical properties of the data was identified as a key starting point for the project.
# Replace NA with zero, where there is no activity in a cell in a year spdf@data <- spdf@data %>% replace(., is.na(.), 0) # sqrt transform catch spdf$sqblkg2012 <- sqrt(spdf$blkg2012) spdf$sqblkg2013 <- sqrt(spdf$blkg2013) spdf$sqblkg2014 <- sqrt(spdf$blkg2014) spdf$sqblkg2015 <- sqrt(spdf$blkg2015) spdf$sqblkg2016 <- sqrt(spdf$blkg2016) spdf$sqblkg2017 <- sqrt(spdf$blkg2017) # sum across specific years spdf$y13_15 <- spdf$blkg2013 + spdf$blkg2014 + spdf$blkg2015 spdf$y14_15 <- spdf$blkg2014 + spdf$blkg2015 spdf$sqy14_15 <- sqrt(spdf$y14_15) spdf$sqy13_15 <- sqrt(spdf$y13_15) ## Set different model formulas f1 <- sqblkg2016 ~ sqblkg2015 f2 <- sqblkg2016 ~ sqy14_15 f3 <- sqblkg2016 ~ sqy13_15 f4 <- sqblkg2016 ~ sqblkg2015 + sqblkg2014 + sqblkg2013
Pairwise plots for catch as a basic exploratory exercise. Note these don't take into account spatial auto-correlation.
## Pairwise plots #### ggpairs(spdf@data, columns = c(5:10)) + #, 23:27)) + # untransformed catch variables #ggpairs(spdf@data, columns = c(39:43), aes(alpha = 0.2)) + # root transformed catch variables theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.position="right")
A quick thematic plot for the variable of interest usign the tmap package. There is example code further down with multiple variables and facets etc.
# quick thematic map for dv qtm(spdf, fill="blkg2016", style="watercolor")
Input to spatial auto-correlation and autoregressive methods requires a nearest neighbour analysis. Here, we convert the hexagon polygons to a spatial point dataset, and attach the data variables. This spatialPointsDataFrame is then used for the NNB functions.
A short section code extracts the first 50 records from spdf, creates the dNNB network and plots the links. The reason for using hexagons is that all neighbours to a cell have the same distance. In a quare grid, this becomes comlicated when using NNB networks greater than a single ring. We could also use a topology based approach, but D = 108 works fine.
# Create new spatialpoints with centroids of hex cells spdf.cent <- SpatialPointsDataFrame(gCentroid(spdf, byid=TRUE, id = spdf$oid), spdf@data) # Create a K nearest neighbour object (class nb) # k=6 will find the 6 hexagons adjacent to the target, or the nearest 6 if in # ribbon. spdf.k1 <- knearneigh(spdf.cent, k=6, longlat = FALSE, RANN = TRUE) spdf.knb <- knn2nb(spdf.k1) lw.k <- nb2listw(spdf.knb, style = "W", zero.policy = TRUE) # Create a d nearest neighbour object (class nb) # D = 108 selects 1 ring of hexagons (i.e. 6 cells) around a target hex cell. # D = 216 selects 2 rings of hexagons (i.e. 12 cells) around a target hex cell. spdf.dnb <- dnearneigh(spdf.cent, 0, 108) ## detects longlat from spatial object lw.d <- nb2listw(spdf.dnb, style = "W", zero.policy = TRUE)
# set D for nearest neighbour distance - try 108, 216, 324, 432 etc D <- 108 nb <- dnearneigh(spdf.cent[1:50,], 0, D) ## detects longlat from spatial object coords <- coordinates(spdf[1:50,]) plot(spdf[1:50,]) plot.nb(nb,coords,add=T) rm(nb)
## Step 1 Global Moran I test #### # Moran's I on the DV moran.test(spdf$sqblkg2016, listw = lw.d, zero.policy=TRUE, na.action=na.omit) moran.mc(spdf$blkg2016, lw.d, nsim = 999,zero.policy=T, na.action=na.omit) ## Step 2 Moran plot #### ## - labels are kg/hex cell mp <- moran.plot( spdf.cent$sqblkg2016, lw.d, labels = round(spdf.cent$sqblkg2016), xlab = "Blacklip Catch", ylab = "Lag of Blacklip Catch", zero.policy = TRUE )
## Step 3 OLS #### m <- lm(f1, data=spdf.cent@data) summary(m) AIC(m) plot(m, which=3) plot(m, which=4) plot(m, which=5) sum(fitted(m)^2)/1000
If not significantly different from expected, suggests there is no spatial dependence, and OLS is probably OK. If Global Moran I for regression residulas is sig. diff. from expected, then suggests there is spatial dependence and can proceed to various forms of accounting for that spatial dependence.
lm.morantest(m, listw = lw.d, zero.policy=TRUE) spdf.cent$lm.resids <- residuals(m) spdf.cent$stdres <- rstandard(m) # spdf$lm.resids <- residuals(m) # spdf$stdres <- rstandard(m) bubble(spdf.cent, 'stdres')
Provides simple and robusts tests to indicate which model is more appropriate (spatial lag vs spatial error).
Lagrange <- lm.LMtests(m, lw.d, zero.policy = TRUE, test="all") # Checking residuals for spatial autocorrelation summary(Lagrange)
http://www.econ.uiuc.edu/~lab/workshop/Spatial_in_R.html#running-spatial-regressions
## Spatial lag model #### m1l <- lagsarlm(f1, data=spdf.cent, lw.d, tol.solve=1.0e-30, zero.policy=TRUE) summary(m1l, Nagelkerke=TRUE, Hausman=TRUE) logLik(m1l) AIC(m1l) AIC(m) anova(m1l, m) bptest.sarlm(m1l) sum(fitted(m1l)^2)/1000 impacts(m1l, listw=lw.d) imps <- impacts(m1l, R=100, listw=lw.d) ## WARNING - takes a lot of time sums <- summary(imps, zstats=T) sums data.frame(sums$res) ## Spatial error model #### m1e <- errorsarlm(f1, data=spdf.cent, lw.d, tol.solve=1.0e-30, zero.policy=TRUE) summary(m1e, Nagelkerke=TRUE, Hausman=TRUE) bptest.sarlm(m1e, studentize=T) logLik(m1e) AIC(m1e) AIC(m) sum(fitted(m1e)^2)/1000 anova(m1e, m) anova(m1l, m) ## Step 7 Spatial lag Durbin model #### m1ld <- lagsarlm(f1, data=spdf.cent, lw.d, type = "mixed", zero.policy=TRUE) summary(m1ld, Nagelkerke=TRUE, Hausman=TRUE) AIC(m1ld) anova(m1l, m1ld) sum(fitted(m1ld)^2)/1000 bptest.sarlm(m1ld) ## Step 8 Spatial error Durbin model #### m1ed <- errorsarlm(f1, data=spdf.cent, lw.d, etype = "emixed", zero.policy=TRUE) summary(m1ed, Nagelkerke=TRUE, Hausman=TRUE) AIC(m1ed) AIC(m) anova(m1e, m1ed) sum(fitted(m1ed)^2)/1000 bptest.sarlm(m1ed)
https://rpubs.com/corey_sparks/246342
## Just the spatially lagged dependent ## spdf.cent$deplag <- lag.listw(lw.d, spdf.cent$sqblkg2016, zero.policy=TRUE, NAOK=TRUE) m.dy <- lm(sqblkg2016 ~ deplag, data=spdf.cent) #m.dy <- lm(sqblkg2016 ~ sqblkg2015 + deplag, data=spdf.cent) summary(m.dy) ## Just the spatially lagged independent spdf.cent$indeplag <- lag.listw(lw.d, spdf.cent$sqblkg2015, zero.policy=TRUE, NAOK=TRUE) m.dx <- lm(sqblkg2016 ~ indeplag, data=spdf.cent) #m.dx <- lm(sqblkg2016 ~ sqblkg2015 + indeplag, data=spdf.cent) summary(m.dx)
tm_shape(spdf) + tm_polygons( breaks = seq(0, 700, 100), c("blkg2013", "blkg2014", "blkg2015", "blkg2016"), n = 8, midpoint = NA, ncol = 2, palette = "-Spectral" ) + tm_facets(sync = TRUE, ncol = 2) + tm_layout(legend.position = c("right", "top"))
## Step 5 GWR (GWModel) #### ## Calculate bandwidth dm.calib <- gw.dist(dp.locat = coordinates(spdf.cent)) gwm.bandwth <- bw.gwr(f1, spdf, approach="CV", kernel="bisquare", longlat=F) gwr.res <- gwr.basic(f4, data=spdf, bw=108, kernel='gaussian', longlat=FALSE) gwr.res.df <- as.data.frame(gwr.res$SDF) head(gwr.res.df) spdf$coefblkg2015 <- gwr.res.df$sqblkg2015 spdf$coefblkg2014 <- gwr.res.df$sqblkg2014 spdf$coefblkg2013 <- gwr.res.df$sqblkg2013 spdf$gwresid <- gwr.res.df$residual spdf$yhat <- gwr.res.df$yhat spdf$localR2 <- gwr.res.df$Local_R2 spdf$yhat.calc <- gwr.res.df$yhat^2 tm_shape(spdf) + tm_polygons(c("gwresid", "localR2", "coefblkg2015"), n=8, midpoint=NA, ncol=2, palette="-Spectral")+ tm_facets(sync = TRUE, ncol = 3) + tm_layout(legend.position = c("right","top")) tm_shape(spdf) + tm_polygons(breaks=seq(0,800,100),c("blkg2016", "yhat.calc"), n=8, midpoint=NA, ncol=2, palette="-Spectral")+ tm_facets(sync = TRUE, ncol = 3) + tm_layout(legend.position = c("right","top")) sum(spdf$sqblkg2016^2)/1000 sum(spdf$yhat^2)/1000 ## Bootstrap testing of reliability of estimates library(boot) set.seed(4676) gwrcoef <- function(hpdf,i) gwr.basic(f1, data=spdf.cent[i,], bw=108, kernel='gaussian', longlat=FALSE)$SDF$sqblkg2015 bootres <- boot(spdf.cent,gwrcoef,100) spdf$bse2015 <- sqrt(apply(bootres$t,2,var)) qtm(spdf, fill="bse2015", style="watercolor") ## Estiamte bias from the bootstrap spdf$bias2015 <- bootres$t0 - apply(bootres$t,2,mean) tm_shape(spdf) + tm_polygons(c("bse2015","bias2015"), n=8, midpoint=NA, ncol=1, palette="-Spectral") + tm_facets(sync = TRUE, ncol = 2) + tm_layout(legend.position = c("right","top"))
spdf1 <- spdf.cent gwr.pred <- gwr.predict(f1, data = spdf.cent, predictdata = spdf1 , bw = 108, kernel = "gaussian", adaptive = F, longlat=FALSE) gwr.pred
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.