library(knitr)
# options(width=90, dev='win.metafile', fig.fullwidth=TRUE) # when using word
options(width=90, fig.fullwidth=TRUE)
opts_chunk$set(comment=NA,  cache=FALSE)
#Load required libraries
library(sp)
library(spdep)
library(spgwr)
library(GWmodel)
library(rgeos)
library(ncf)
library(lmtest)
library(gdata)
library(tidyverse)
library(GGally)
library(tmap)
# Load SpatialPointsDataFrame
spdf <- readRDS("C:/CloudStor/R_Stuff/SPM/spdf_13E.rds")
#spdf <- readRDS("C:/CloudStor/R_Stuff/SPM/spdf_11C.rds")
#spdf <- readRDS("C:/CloudStor/R_Stuff/SPM/spdf_06C.rds")

## Filter cells which have negligible fishing activity. Currently set at
## a minimum of 30 minutes across the six year dataset.
spdf <- subset(spdf, minstotal > 30)

Background

The local scale patchiness in habitat quality and environmental conditions combined with abalone biology are expected to generate some level of spatial autocorrelation and spatial structure in harvest data. Whether environmental drivers over ride fishing history as the major determinant of future catch has not been explored empirically. Spatial patterns in productivty and harvest are unlikely to be homogeneous, and may not be temporally stable. Of considerbale interest is both the level of heterogeneity within fishing grounds, and, whether prior fishing effort effects future harvest.

Regression methods provide an opportunity to explore the extent to which local catch can be explained by prior fishing history. Contrasting Ordinary Least Squares regression methods with two forms of spatial regression (spatial autoregressive models and geographic weighted regression), can highlight the extent to which accounting for spatial autocorrelation contribution of local spatial dynamics, and, utilising spatial information improvesto the empirical relationship between catch and fishing history at the local scale. Parametric analyses assume that samples and/or error terms are independent. With spatially derived datasets, this assumption is likely to be compromised and unless spatial autocorrelation is accounted for. When addressing spatial autocorrelation in datasets, the appropriate analytical approach is dependent on whether spatial variation is of inherent interest or considered a nuisance factor, and whether the autocorrelation affects either the dependent or independent variables, or both. In the case of spatial heterogeneity in the data where nearby observations are affected by (or effect) neighbouring observations, we would apply a Spatial Lag model (Ward and Gleditsch 2008). Spatial Lag models effectively treat spatial variation, as a component of interest. In the case of spatial dependence where data are linked to neighbouring data through an unspecified process, we apply a Spatial Error model. Spatial error models are often referred to as mis-specified models because there are factors of importance that have not been captured that affect the local value of data. For this reason, spatial error models treat spatial autocorrelation as a nuisance factor. Where the spatial influence is considered to act on both the independent and dependent variables, then a Spatial Durbin model is more appropriate. There seem few plausible arguments that the magnitude of catch level in one grid cell could have a direct affect in a neighbouring grid cell in the same year (i.e. Spatial Lag Model). More likely, catch in a grid cell is a locally correlated either because habitat allows similar levels of productivity, or, the local scale of targeted fishing drives high and low intensities of fishing pressure (i.e. Spatial Error Model, or Spatial Durbin Model). In this chapter we explore the application of spatial regression methods to quantify the relationship between prior fishing history and catch at the local scale. Geographic Weighted Regression (GWR) seeks to account for non-stationarity in the dataset using a form of moving window regression, and has some similarity to the approach used for Regression Kriging. GWR is a relatively new method in the regression family and has been suggested as a useful exploratory analysis tool, with some caution advised on interpretation of results.

This example is based on the hexagonal grid dataset. The example files included for this Minimum Working Example (MWE) are SpatialPolygonDataFrames, with years as columns side by side. A stacked version of this data is also available if required.

The data

# Variable names
names(spdf)

Available variables:

  1. Zone, blockno and subblockno contain course to fine scale breakdown of fishing activity. Note that some blocks are split across zones (e.g. 13A & 13B are in Western Zone, 13C, 13D, 13E are in Eastern Zone), due to some inspiring logic(?) prior to Malcolm and I arriving on the scene.
  2. blkg20xx - Catch per hexagon for each year.
  3. days20xx - Number of unique days in each hexagon where fishing was recorded.
  4. mins20xx - Minutes of effort in each hexagon.
  5. divers20xx - Number of unique divers in each hexagon.
  6. kgday20xx - catch/days (feel free to calculate catch/hour)

N.B. blkgtotal, daystotal and minstotal contain the sum across all years for each cell. For obvious reasons this is not done for divers.

Arranging the horizontal form of grid data with a column for each year results in NA where there is no activity in a cell in a given year. This NA causes problems with the Nearest Neighbour function and with the sqrt transform, and NA is repalced with zero as part of the data prep. I have also root transformed the data because it apeared to be a better fit, but there is likely to be a better strategy and happy to be advised this is not sensible. A methodically look at the statistical properties of the data was identified as a key starting point for the project.

# Replace NA with zero, where there is no activity in a cell in a year
spdf@data <- spdf@data %>% replace(., is.na(.), 0)

# sqrt transform catch 
spdf$sqblkg2012 <- sqrt(spdf$blkg2012)
spdf$sqblkg2013 <- sqrt(spdf$blkg2013)
spdf$sqblkg2014 <- sqrt(spdf$blkg2014)
spdf$sqblkg2015 <- sqrt(spdf$blkg2015)
spdf$sqblkg2016 <- sqrt(spdf$blkg2016)
spdf$sqblkg2017 <- sqrt(spdf$blkg2017)

# sum across specific years
spdf$y13_15 <- spdf$blkg2013 + spdf$blkg2014 + spdf$blkg2015
spdf$y14_15 <- spdf$blkg2014 + spdf$blkg2015 

spdf$sqy14_15 <- sqrt(spdf$y14_15)
spdf$sqy13_15 <- sqrt(spdf$y13_15)

## Set different model formulas
f1 <- sqblkg2016 ~ sqblkg2015 
f2 <- sqblkg2016 ~ sqy14_15 
f3 <- sqblkg2016 ~ sqy13_15 
f4 <- sqblkg2016 ~ sqblkg2015 + sqblkg2014  + sqblkg2013

Inter-annual correlations.

testing

Pairwise plots for catch as a basic exploratory exercise. Note these don't take into account spatial auto-correlation.

## Pairwise plots ####
ggpairs(spdf@data, columns = c(5:10)) + #, 23:27)) + # untransformed catch variables
#ggpairs(spdf@data, columns = c(39:43), aes(alpha = 0.2))  + # root transformed catch variables
  theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.position="right") 

Quick visualisation of the data

A quick thematic plot for the variable of interest usign the tmap package. There is example code further down with multiple variables and facets etc.

# quick thematic map for dv
qtm(spdf, fill="blkg2016", style="watercolor")

Nearest Neighbour networks

Input to spatial auto-correlation and autoregressive methods requires a nearest neighbour analysis. Here, we convert the hexagon polygons to a spatial point dataset, and attach the data variables. This spatialPointsDataFrame is then used for the NNB functions.

A short section code extracts the first 50 records from spdf, creates the dNNB network and plots the links. The reason for using hexagons is that all neighbours to a cell have the same distance. In a quare grid, this becomes comlicated when using NNB networks greater than a single ring. We could also use a topology based approach, but D = 108 works fine.

# Create new spatialpoints with centroids of hex cells
spdf.cent <- SpatialPointsDataFrame(gCentroid(spdf, byid=TRUE, id = spdf$oid), spdf@data)

# Create a K nearest neighbour object (class nb) 
# k=6 will find the 6 hexagons adjacent to the target, or the nearest 6 if in
# ribbon.
spdf.k1 <- knearneigh(spdf.cent, k=6, longlat = FALSE, RANN = TRUE)
spdf.knb <- knn2nb(spdf.k1)
lw.k <- nb2listw(spdf.knb, style = "W", zero.policy = TRUE)

# Create a d nearest neighbour object (class nb)
# D = 108 selects 1 ring of hexagons (i.e. 6 cells) around a target hex cell.
# D = 216 selects 2 rings of hexagons (i.e. 12 cells) around a target hex cell.
spdf.dnb <- dnearneigh(spdf.cent, 0, 108) ## detects longlat from spatial object 
lw.d <- nb2listw(spdf.dnb, style = "W", zero.policy = TRUE)

Explore the NNB network linkages

# set D for  nearest neighbour distance - try 108, 216, 324, 432 etc
D <- 108
nb <- dnearneigh(spdf.cent[1:50,], 0, D) ## detects longlat from spatial object 
coords <- coordinates(spdf[1:50,])
plot(spdf[1:50,])
plot.nb(nb,coords,add=T)
rm(nb)

Examine Global Moran I

## Step 1 Global Moran I test ####
# Moran's I on the DV

moran.test(spdf$sqblkg2016, listw = lw.d, zero.policy=TRUE, na.action=na.omit)
moran.mc(spdf$blkg2016, lw.d, nsim = 999,zero.policy=T, na.action=na.omit)


## Step 2 Moran plot ####
## - labels are kg/hex cell
mp <-
  moran.plot(
    spdf.cent$sqblkg2016,
    lw.d,
    labels = round(spdf.cent$sqblkg2016),
    xlab = "Blacklip Catch",
    ylab = "Lag of Blacklip Catch",
    zero.policy = TRUE
  )

Simple OLS

## Step 3 OLS ####
m <- lm(f1, data=spdf.cent@data)
summary(m)
AIC(m)

plot(m, which=3)
plot(m, which=4)
plot(m, which=5)
sum(fitted(m)^2)/1000

OLS diagnostics

Check residuals for spatial dependence

If not significantly different from expected, suggests there is no spatial dependence, and OLS is probably OK. If Global Moran I for regression residulas is sig. diff. from expected, then suggests there is spatial dependence and can proceed to various forms of accounting for that spatial dependence.

lm.morantest(m, listw = lw.d,  zero.policy=TRUE)

spdf.cent$lm.resids <- residuals(m)
spdf.cent$stdres <- rstandard(m)
# spdf$lm.resids <- residuals(m)
# spdf$stdres <- rstandard(m)

bubble(spdf.cent, 'stdres')

Lagrange Multiplier tests

Provides simple and robusts tests to indicate which model is more appropriate (spatial lag vs spatial error).

Lagrange <- lm.LMtests(m, lw.d, zero.policy = TRUE, test="all") # Checking residuals for spatial autocorrelation
summary(Lagrange)

Spatial autoregresive models

http://www.econ.uiuc.edu/~lab/workshop/Spatial_in_R.html#running-spatial-regressions

## Spatial lag  model ####
m1l <- lagsarlm(f1, data=spdf.cent, lw.d, tol.solve=1.0e-30, zero.policy=TRUE)
summary(m1l, Nagelkerke=TRUE, Hausman=TRUE)
logLik(m1l)
AIC(m1l)
AIC(m)
anova(m1l, m)
bptest.sarlm(m1l)

sum(fitted(m1l)^2)/1000
impacts(m1l, listw=lw.d)
imps <- impacts(m1l, R=100, listw=lw.d) ## WARNING - takes a lot of time
sums <- summary(imps,  zstats=T)
sums
data.frame(sums$res)


## Spatial error model ####
m1e <- errorsarlm(f1, data=spdf.cent, lw.d, tol.solve=1.0e-30, zero.policy=TRUE)
summary(m1e, Nagelkerke=TRUE, Hausman=TRUE)
bptest.sarlm(m1e, studentize=T)
logLik(m1e)
AIC(m1e)
AIC(m)
sum(fitted(m1e)^2)/1000

anova(m1e, m)
anova(m1l, m)



## Step 7 Spatial lag Durbin model ####
m1ld <- lagsarlm(f1, data=spdf.cent, lw.d, type = "mixed", zero.policy=TRUE)
summary(m1ld, Nagelkerke=TRUE, Hausman=TRUE)
AIC(m1ld)
anova(m1l, m1ld)
sum(fitted(m1ld)^2)/1000

bptest.sarlm(m1ld)


## Step 8 Spatial error Durbin model ####
m1ed <- errorsarlm(f1, data=spdf.cent, lw.d, etype = "emixed", zero.policy=TRUE)
summary(m1ed, Nagelkerke=TRUE, Hausman=TRUE)
AIC(m1ed)
AIC(m)
anova(m1e, m1ed)
sum(fitted(m1ed)^2)/1000

bptest.sarlm(m1ed)

https://rpubs.com/corey_sparks/246342

Manually Include spatial lag

## Just the spatially lagged dependent
## 
spdf.cent$deplag <- lag.listw(lw.d, spdf.cent$sqblkg2016, zero.policy=TRUE, NAOK=TRUE)
m.dy <- lm(sqblkg2016 ~  deplag, data=spdf.cent)
#m.dy <- lm(sqblkg2016 ~ sqblkg2015 + deplag, data=spdf.cent)
summary(m.dy) 

## Just the spatially lagged independent
spdf.cent$indeplag <- lag.listw(lw.d, spdf.cent$sqblkg2015, zero.policy=TRUE, NAOK=TRUE)
m.dx <- lm(sqblkg2016 ~  indeplag, data=spdf.cent)
#m.dx <- lm(sqblkg2016 ~  sqblkg2015 + indeplag, data=spdf.cent)
summary(m.dx) 
tm_shape(spdf) + tm_polygons(
  breaks = seq(0, 700, 100),
  c("blkg2013", "blkg2014", "blkg2015", "blkg2016"),
  n = 8,
  midpoint = NA,
  ncol = 2,
  palette = "-Spectral"
  ) + tm_facets(sync = TRUE, ncol = 2)  + tm_layout(legend.position = c("right", "top"))

Geographic Weighted Regression

## Step 5 GWR (GWModel) ####
## Calculate bandwidth
dm.calib <- gw.dist(dp.locat = coordinates(spdf.cent))
gwm.bandwth <- bw.gwr(f1, spdf, approach="CV", kernel="bisquare", longlat=F)

gwr.res <- gwr.basic(f4, data=spdf, bw=108,  kernel='gaussian', longlat=FALSE)
gwr.res.df <- as.data.frame(gwr.res$SDF)

head(gwr.res.df)

spdf$coefblkg2015 <- gwr.res.df$sqblkg2015
spdf$coefblkg2014 <- gwr.res.df$sqblkg2014
spdf$coefblkg2013 <- gwr.res.df$sqblkg2013

spdf$gwresid <- gwr.res.df$residual
spdf$yhat <- gwr.res.df$yhat
spdf$localR2 <- gwr.res.df$Local_R2
spdf$yhat.calc <- gwr.res.df$yhat^2

tm_shape(spdf) + tm_polygons(c("gwresid", "localR2", "coefblkg2015"),  n=8, midpoint=NA, ncol=2, palette="-Spectral")+ tm_facets(sync = TRUE, ncol = 3)  + tm_layout(legend.position = c("right","top"))
tm_shape(spdf) + tm_polygons(breaks=seq(0,800,100),c("blkg2016", "yhat.calc"),  n=8, midpoint=NA, ncol=2, palette="-Spectral")+ tm_facets(sync = TRUE, ncol = 3)  + tm_layout(legend.position = c("right","top"))

sum(spdf$sqblkg2016^2)/1000
sum(spdf$yhat^2)/1000


## Bootstrap testing of reliability of estimates
library(boot)
set.seed(4676)
gwrcoef <- function(hpdf,i) gwr.basic(f1, data=spdf.cent[i,], bw=108, kernel='gaussian', longlat=FALSE)$SDF$sqblkg2015
bootres <- boot(spdf.cent,gwrcoef,100)
spdf$bse2015 <- sqrt(apply(bootres$t,2,var))
qtm(spdf, fill="bse2015", style="watercolor")

## Estiamte bias from the bootstrap
spdf$bias2015 <- bootres$t0 - apply(bootres$t,2,mean)
tm_shape(spdf) + tm_polygons(c("bse2015","bias2015"),  n=8, midpoint=NA, ncol=1, palette="-Spectral") + tm_facets(sync = TRUE, ncol = 2)  + tm_layout(legend.position = c("right","top"))

GWRPrediction

spdf1 <- spdf.cent
gwr.pred <- gwr.predict(f1, data = spdf.cent, predictdata = spdf1 , bw = 108, kernel = "gaussian", adaptive = F, longlat=FALSE)
gwr.pred


haddonm/abspatial documentation built on June 7, 2019, 9:54 a.m.