library(knitr) opts_chunk$set(collapse=TRUE, fig.align='center', tidy=TRUE, tidy.opts=list(blank=TRUE, width.cutoff=40), warning=FALSE,message=FALSE)
The nc.sids
data frame has 100 rows and 21 columns and can be found in the spdep
library.
It contains data given in Cressie (1991, pp. 386-9), Cressie and Read (1985) and Cressie and Chan (1989) on sudden infant deaths in North Carolina for 1974--78 and 1979--84.
The data set also contains the neighbour list given by Cressie
and Chan (1989) omitting self-neighbours (ncCC89.nb
), and the
neighbour list given by Cressie and Read (1985) for contiguities
(ncCR85.nb
).
Data is available on the numbers of cases and on the number of births, both dichotomized by a binary indicator of race.
The data are ordered by county ID number, not alphabetically as in the source tables.
The code below plots the county boundaries along with the observed SMRs.
The expected numbers are based on internal standardization with a single stratum.
\small
library(maptools) nc.sids <- readShapePoly(system.file("etc/shapes/sids.shp", package="spdep")[1],ID="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66")) nc.sids2 <- nc.sids # Create a copy, to add to Y <- nc.sids$SID74 E <- nc.sids$BIR74*sum(Y)/sum(nc.sids$BIR74) nc.sids2$SMR74 <- Y/E nc.sids2$EXP74 <- E brks <- seq(0,5,1)
We map the SMRs, and see a number of counties with high relative risks.
\small
spplot(nc.sids2,"SMR74",at=brks, col.regions=grey.colors(5,start=.9,end=.1))
Examine $\kappa$, the overdispersion statistic, and use a Monte Carlo test to examine significance.
\small
library(spdep) kappaval <- function(Y,fitted,df){ sum((Y-fitted)^2/fitted)/df} mod <- glm(Y~1,offset=log(E),family="quasipoisson") kappaest <- kappaval(Y,mod$fitted,mod$df.resid) nMC <- 1000 ncts <- length(E) yMC <- matrix(rpois(n=nMC*ncts,lambda=E), nrow=ncts,ncol=nMC) kappaMC <- NULL for (i in 1:nMC){ modMC <- glm(yMC[,i]~1,offset=log(E),family="quasipoisson") kappaMC[i] <- kappaval(yMC[,i],modMC$fitted,modMC$df.resid) }
\small
hist(kappaMC,xlim=c(min(kappaMC),max(kappaMC,kappaest)), main="",xlab=expression(kappa)) abline(v=kappaest,col="red")
We first fit a non-spatial random effects model: \begin{eqnarray} Y_i | \alpha,V_i &\sim_{iid}&\mbox{Poisson}(E_i\mbox{e}^{\alpha+V_i}),\ V_i | \sigma_v^2 &\sim_{iid}& N(0,\sigma_v^2) \end{eqnarray}
\small
library(INLA) nc.sids$ID <- 1:100 m0 <- inla(SID74~f(ID, model="iid"),family="poisson", E=EXP74, data=as.data.frame(nc.sids), control.predictor=list(compute=TRUE))
Examine the first few "fitted values", summaries of the posterior distribution of $\exp(\alpha+V_i),\qquad i=1,\dots,n.$
\scriptsize
head(m0$summary.fitted.values)
Create two interesting inferential summaries. \small
# We add the posterior mean of the relative risk to our object nc.sids$RRpmean0 <- m0$summary.fitted.values[,1] # Also, a binary indicator of whether posterior # median is greater than 1.5 (an epidemiologically significant # value) nc.sids$RRind0 <- m0$summary.fitted.values[,4] > 1.5
\small
# Display relative risk estimates spplot(nc.sids, "RRpmean0")
\small
# Display indicators of whether 0.5 points above 1.5 spplot(nc.sids, "RRind0")
We now fit a model with non-spatial and spatial random effects.
\small
# Create adjacency matrix and output to INLA graph format poly2nb(nc.sids) %>% nb2INLA(file="vignettes/NC.graph", nb=.) nc.sids$ID2 <- 1:100 m1 <- inla( formula = SID74 ~ 1 + f(ID, model="iid") + f(ID2, model="besag", graph="vignettes/NC.graph"), family="poisson", E=EXP74, data=nc.sids@data, control.predictor=list(compute=TRUE)) nc.sids$RRpmean1 <- m1$summary.fitted.values[,1] nc.sids$RRind1 <- m1$summary.fitted.values[,4] > 1.5
spplot(nc.sids, "RRpmean1")
# Display areas with medians above 1.5, ie those areas # with greater than 50% chance of exceedence of 1.5. spplot(nc.sids, "RRind1")
plot(nc.sids$RRpmean1~nc.sids$RRpmean0, type="n", xlab="Non-spatial model", ylab="Spatial model") text(nc.sids$RRpmean1~nc.sids$RRpmean0) abline(0,1) title("Comparison of Posterior Means")
We now examine the variances of the spatial and non-spatial random effects. Recall that the ICAR model variance has a conditional interpretation. To obtain a rough estimate of the marginal variance we obtain the posterior median of the $U_i$'s and evaluate their variance. From the output below, we conclude that the spatial random effects dominate for the SIDS data so that we conclude there is clustering of cases in neighboring areas.
# Extract spatial random effects U <- m1$summary.random$ID2[5]; sqrt(var(U)) # 0.33 # variance of non-spatial m1$summary.hyperpar
We evaluate Moran's test for spatial autocorrelation using the "W" style weight function: this standardizes the weights so that for each area the weights sum to 1. To obtain a variable with approximately constant variance we form residuals from an intercept only model.
col.W <- nb2listw(ncCR85.nb, style="W", zero.policy=TRUE) quasipmod <- glm( formula = SID74~1, offset=log(EXP74), data=nc.sids, family=quasipoisson()) sidsres <- residuals(quasipmod, type="pearson")
\scriptsize
moran.test(sidsres,col.W)
Moran's test may suggest spatial autocorrelation if there exists a non-constant mean function.
Below we fit a model with Eastings and Northings (of the County seat) as covariates -- both show some association and the significance of the Moran statistic is reduced, though still significant.
\scriptsize
quasipmod2 <- glm(SID74~east+north,offset=log(EXP74), data=nc.sids2,family=quasipoisson()) summary(quasipmod2) sidsres2 <- residuals(quasipmod2,type="pearson") nc.sids2$res <- sidsres2
\small
par(mar=c(.1,.1,.1,.1)) spplot(nc.sids2,"res")
\scriptsize
moran.test(sidsres2,col.W)
We now use Geary's statistic on the detrended residuals, and come to the same conclusion
\scriptsize
geary.test(sidsres2,col.W)
We now use Moran's statistic on the detrended residuals, but with the binary ``B" weight option.
This option has unstandardized weights.
Note the asymmetry in the ``W" weights option in the figure below.
The conclusion, evidence of spatial autocorrelation, is the same as with the standardized weights option.
\scriptsize
col.B <- nb2listw(ncCR85.nb, style="B",zero.policy=TRUE) moran.test(sidsres2,col.B)
We now use Moran's statistic on the detrended residuals, but with the binary ``B" weight option.
This option has unstandardized weights.
Note the asymmetry in the ``W" weights option in the figure below.
The conclusion, evidence of spatial autocorrelation, is the same as with the standardized weights option.
\scriptsize
col.B <- nb2listw(ncCR85.nb, style="B",zero.policy=TRUE) moran.test(sidsres2,col.B)
\scriptsize
library(RColorBrewer) pal <- brewer.pal(9, "Reds") par(mfrow=c(1,2)) z <- t(listw2mat(col.W)) brks <- c(0,0.1,0.143,0.167,0.2,0.5,1) nbr3 <- length(brks)-3 image(1:100,1:100,z[,ncol(z):1],breaks=brks, col=pal[c(1,(9-nbr3):9)], main="W style", axes=FALSE) box() z <- t(listw2mat(col.B)) brks <- c(0,0.1,0.143,0.167,0.2,0.5,1) nbr3 <- length(brks)-3 image(1:100,1:100,z[,ncol(z):1],breaks=brks, col=pal[c(1,(9-nbr3):9)], main="B style", axes=FALSE) box()
\small
library(RColorBrewer) pal <- brewer.pal(9, "Reds") par(mfrow=c(1,2)) z <- t(listw2mat(col.W)) brks <- c(0,0.1,0.143,0.167,0.2,0.5,1) nbr3 <- length(brks)-3 image(1:100,1:100,z[,ncol(z):1],breaks=brks, col=pal[c(1,(9-nbr3):9)], main="W style", axes=FALSE) box() z <- t(listw2mat(col.B)) brks <- c(0,0.1,0.143,0.167,0.2,0.5,1) nbr3 <- length(brks)-3 image(1:100,1:100,z[,ncol(z):1],breaks=brks, col=pal[c(1,(9-nbr3):9)], main="B style", axes=FALSE) box()
Both of the Moran's $I$ and Geary's $c$ methods suggest that there is evidence of clustering in these data.
We implement Openshaw's method using the centroids of the areas in data.
Circles of radius 30 are used and the centers are placed on a grid of size 10.
For multiple radii, multiple calls are required.
The significance level for calling a cluster is 0.002.
\scriptsize
library(spdep) data(nc.sids) sids <- data.frame(Observed=nc.sids$SID74) sids <- cbind(sids,Expected=nc.sids$BIR74*sum(nc.sids$SID74)/ sum(nc.sids$BIR74)) sids <- cbind(sids, x=nc.sids$x, y=nc.sids$y) # GAM library(DCluster) sidsgam <- opgam(data=sids, radius=30, step=10, alpha=.002)
\small
plot(sids$x, sids$y, xlab="Easting", ylab="Northing") # Plot points marked as clusters points(sidsgam$x, sidsgam$y, col="red", pch="*")
Openshaw results.
\scriptsize
sidsgam
\scriptsize
library(SpatialEpi) library(maptools) library(spdep) library(maps) library(ggplot2) library(sp) nc.sids <- readShapePoly(system.file("etc/shapes/sids.shp", package="spdep")[1],ID="FIPSNO",proj4string= CRS("+proj=longlat +ellps=clrk66")) referencep <- sum(nc.sids$SID74)/sum(nc.sids$BIR74) population <- nc.sids$BIR74 cases <- nc.sids$SID74 E <- nc.sids$BIR74*referencep SMR <- cases/E n <- length(cases)
\scriptsize
getLabelPoint <- function(county) {Polygon(county[c('long', 'lat')])@labpt} df <- map_data('county', 'north carolina') # NC region county data centNC <- by(df, df$subregion, getLabelPoint) # Returns list centNC <- do.call("rbind.data.frame", centNC) # Convert to Data Frame names(centNC) <- c('long', 'lat') # Appropriate Header centroids <- matrix(0, nrow=n, ncol=2) for(i in 1:n) {centroids[i, ] <- c(centNC$lat[i],centNC$long[i])} colnames(centroids) <- c("x", "y") rownames(centroids) <- 1:n
\scriptsize
NCTemp <- map('county','north carolina',fill=TRUE,plot=FALSE) NCIDs <- substr(NCTemp$names,1+nchar("north carolina,"),nchar(NCTemp$names) ) NC <- map2SpatialPolygons(NCTemp,IDs=NCIDs,proj4string=CRS("+proj=longlat")) # Fix currituck county which is 3 islands index <- match(c("currituck:knotts", "currituck:main", "currituck:spit"), NCIDs) currituck <- list() for(i in c(27:29)) currituck <- c(currituck, list(Polygon(NC@polygons[[i]]@Polygons[[1]]@coords)) ) currituck <- Polygons(currituck,ID = "currituck")
\scriptsize
# make new spatial polygons object NC.new <- NC@polygons[ 1:(index[1]-1) ] NC.new <- c(NC.new, currituck) NC.new <- c(NC.new, NC@polygons[ (index[3]+1):length(NC@polygons) ] ) NC.new <- SpatialPolygons(NC.new,proj4string=CRS("+proj=longlat")) NCIDs <- c(NCIDs[ 1:(index[1]-1) ], "currituck", NCIDs[ (index[3]+1):length(NC@polygons) ]) NC <- NC.new # SANITY CHECK: Reorder Spatial Polygons of list to match order of county names <- rep("",100) for(i in 1:length(NC@polygons)) names[i] <- NC@polygons[[i]]@ID identical(names, NCIDs) index <- match(NCIDs,names) NC@polygons <- NC@polygons[index] rm(index) names <- rep("",100) for(i in 1:length(NC@polygons)) names[i] <- NC@polygons[[i]]@ID identical(names, NCIDs)
\scriptsize
k <- 20 alpha.level <- 0.01 geo <- centroids BNresults <- besag_newell(geo,population,cases, expected.cases=NULL,k,alpha.level) BNsig <- length(BNresults$p.values[BNresults$p.values<alpha.level]) cat("No of sig results = ",BNsig,"\n") resmat <- matrix(NA,nrow=BNsig,ncol=100); reslen <- NULL for (i in 1:length(BNresults$clusters)){ reslen[i] <- length(BNresults$clusters[[i]]$location.IDs.included) resmat[i,1:reslen[i]] <- BNresults$clusters[[i]]$location.IDs.included }
\small
par(mfrow=c(3,3),mar=c(.1,.1,.1,.1)) for (i in 1:6){ plot(NC.new) plot(NC.new[resmat[i,c(1:reslen[i])]],col="red",add=T)}
\small
par(mfrow=c(3,3),mar=c(.1,.1,.1,.1)) for (i in 6:10){ plot(NC.new) plot(NC.new[resmat[i,c(1:reslen[i])]],col="red",add=T)}
\scriptsize
# Kulldorff pop.upper.bound <- 0.2 n.simulations <- 999 alpha.level <- 0.05 Kpoisson <- kulldorff(geo,cases,population,expected.cases=NULL, pop.upper.bound, n.simulations, alpha.level, plot=T) Kcluster <- Kpoisson$most.likely.cluster$location.IDs.included
\scriptsize
# Kulldorff pop.upper.bound <- 0.2 n.simulations <- 999 alpha.level <- 0.05 Kpoisson <- kulldorff(geo,cases,population,expected.cases=NULL, pop.upper.bound, n.simulations, alpha.level, plot=T) Kcluster <- Kpoisson$most.likely.cluster$location.IDs.included
\small
plot(NC.new,axes=TRUE) plot(NC.new[Kcluster],add=TRUE,col="red") title("Most Likely Cluster")
Now look at secondary clusters.
Two are significant, and indicated in Figures below
\scriptsize
K2cluster <- Kpoisson$secondary.clusters[[1]]$location.IDs.included plot(NC.new,axes=TRUE) plot(NC.new[K2cluster],add=TRUE,col="red") title("2nd Most Likely Cluster")
\small
plot(NC.new,axes=TRUE) plot(NC.new[K2cluster],add=TRUE,col="red") title("2nd Most Likely Cluster")
\scriptsize
library(spdep) devtools::install_github("rudeboybert/SpatialEpi") library(SpatialEpi) data("nc.sids") # Load NC map and obtain geographic centroids library(maptools) sp.obj <- readShapePoly(system.file("etc/shapes/sids.shp", package="spdep")[1],ID="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66")) centroids <- latlong2grid(coordinates(sp.obj))
\scriptsize
library(maptools) y <- nc.sids$SID74 population <- nc.sids$BIR74 E <- expected(population, y, 1) max.prop <- 0.15 k <- 0.00005 shape <- c(2976.3, 2.31); rate <- c(2977.3, 1.31) J <- 7 pi0 <- 0.95 n.sim.lambda <- 0.5*10^4 n.sim.prior <- 0.5*10^4 n.sim.post <- 0.5*10^5 output <- bayes_cluster(y, E, population, sp.obj, centroids, max.prop, shape, rate, J, pi0, n.sim.lambda, n.sim.prior, n.sim.post)
\small
SMR <- y/E plotmap(SMR, sp.obj,nclr=6,location="bottomleft") plotmap(output$prior.map$high.area, sp.obj,nclr=6,location="bottomleft") plotmap(output$post.map$high.area, sp.obj,nclr=6,location="bottomleft") barplot(output$pj.y, names.arg=0:J, xlab="j", ylab="P(j|y)") plotmap(output$post.map$RR.est.area, sp.obj, log=TRUE,nclr=6,location="bottomleft")
\small
plotmap(SMR, sp.obj,nclr=6,location="bottomleft")
\small
plotmap(output$prior.map$high.area,nclr=6, sp.obj,location="bottomleft")
\small
plotmap(output$post.map$high.area,nclr=6,sp.obj,location="bottomleft")
\small
barplot(output$pj.y, names.arg=0:J,xlab="j", ylab="P(j|y)")
\small
plotmap(output$post.map$RR.est.area,sp.obj,nclr=6, log=TRUE,location="bottomleft")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.