1. Overview of Worked Example {-}

a) Background {-}

There are many ways graphs can be implemented to understand population structure and relate that structure to landscape characteristics (see Dyer and Nason 2004). In this exercise, we will calculate various graph metrics and apply graphs to fit a gravity model.

Gravity models are a type of inferential model that exploit graph characteristics. Gravity models include both at-site and between-site landscape data. They are a type of graph consisting of nodes and edges. These nodes and edges represent landscape characteristics associated with these graph elements.

b) Data set {-}

In this exercise, you will use the gravity model framework to build an empirical model of gene flow for the Columbia spotted frog dataset in central Idaho that you have used for several other exercises (Murphy et al. 2010).

2. Setup {-}

Add required packages {-}

This code checks of all required packages are installed, and if so, loads them. If any are missing, it will return a message that identifies the packages that need to be installed. If that happens, install the packages and run the code again.

The package then checks if your version of the GeNetIt package is up-to-date (at least 0.1-5), and if not, installs the latest version.

p <- c("raster", "igraph", "sp", "GeNetIt", "spatialEco", "leaflet",
       "sf", "terra", "sfnetworks", "spdep", "dplyr", "tmap", "devtools") 
  if(any(!unlist(lapply(p, requireNamespace, quietly=TRUE)))) { 
    m = which(!unlist(lapply(p, requireNamespace, quietly=TRUE)))
      suppressMessages(invisible(lapply(p[-m], require,    
                       character.only=TRUE)))
    stop("Missing library, please install ", paste(p[m], collapse = " "))
  } else {
    if(packageVersion("GeNetIt") < "0.1-5") {
      remotes::install_github("jeffreyevans/GeNetIt")
    } 
    suppressMessages(invisible(lapply(p, require, character.only=TRUE)))
  }

Prepare work environment {-}

# Get the path to your project folder
Path <- here::here("downloads")

# indicates UTM 11 NAD83 projection
prj = 32611 

# Some needed functions
back.transform <- function(y) exp(y + 0.5 * stats::var(y))
rmse = function(p, o){ sqrt(mean((p - o)^2)) }

3. Wetland complex data preparation {-}

In this sections, we will read in all wetland locations in the study area and calculate a few graph-based metrics to assign to wetland sites that data was collected at. This allows us to put our samples into the context of the larger wetland system thus, accounting for proximity and juxtaposition.

a) Read in wetlands data {-}

Import file "Wetlands.csv".

wetlands <- read.csv(system.file("extdata/Wetlands.csv", package="LandGenCourse"), 
                    header = TRUE)
head(wetlands)

Make it a spatial object.

wetlands <- st_as_sf(wetlands, coords = c("X", "Y"), 
                     crs = 32611, agr = "constant") 
str(wetlands)

b) Create wetlands graph {-}

Create Gabriel graph from the wetlands to represent a "realization" of connectivity and spatial arrangement.

Derive Gabriel graph

gg <- graph2nb(gabrielneigh(st_coordinates(wetlands)),sym=TRUE)

plot(gg, coords=st_coordinates(wetlands))

Questions:

c) Graph metrics {-}

To calculate graph metrics, we need to do a few steps:

Coerce to sf line object (will be used to create igraph object)

gg <- nb2lines(gg, coords = sf::st_coordinates(wetlands), 
                 proj4string = prj, as_sf=TRUE)

Coerce to a sfnetwork, which is an igraph object:

wg <- as_sfnetwork(gg, edges=gg, nodes=wetlands, directed = FALSE,
                  node_key = "SiteName", length_as_weight = TRUE, 
                          edges_as_lines = TRUE)

Calculate weights

w <- wg %>% activate("edges") %>% pull(weight) %>% as.numeric()
w[w <= 0] <- 1 
w = w / sum(w)

Calculate graph metrics of betweenness and closeness with weights and degree. We'll add these as attributes (columns) to the wetlands sf object.

wetlands$betweenness <- igraph::betweenness(wg, directed=FALSE, weights=w)
wetlands$degree <- igraph::degree(wg)
wetlands$closeness <- igraph::closeness(wg, weights=w)
wetlands

d) Plot graph metric {-}

Plot results using the graph edges and wetlands points with the attributes "betweenness", "closeness" and "degree".

Plot betweenness

plot(st_geometry(gg), col="grey")
  plot(wetlands["betweenness"], pch=19,  
       cex=0.75, add=TRUE)
     box()
     title("Wetlands Gabriel graph betweenness")

Plot closeness

plot(st_geometry(gg), col="grey")
plot(wetlands["closeness"], pch=19,  
     cex=0.75, add=TRUE)
box()
title("Wetlands Gabriel graph closeness")

Plot degree

plot(st_geometry(gg), col="grey")
plot(wetlands["degree"], pch=19,  
     cex=0.75, add=TRUE)
box()
title("Wetlands Gabriel graph degree")

Questions: Consider the three figures above and how the graph was constructed.

4. Wetland field-data preparation {-}

In this section we will read the field data and add the node metrics we just calculated.

Using RALU_Site.csv, we read in the data, add the node data (betweenness and degree), create a spatial object that includes the node data.

Read in site data

sites <- read.csv(system.file("extdata/RALU_Site.csv", package="LandGenCourse"), 
                  header = TRUE)
sites$SiteID <- as.character(sites$SiteID)

Add the node data.

Note: using names is dangerous as, small changes in names can result in non-matches. In this case, the ID fields are not consistent (data were collected at different times for different purposes originally). However, names are standardized in a drop-down list of a database. So they are a matching field. My recommendation to you all is to do to this type of operation on a numeric field.

nodestats <- st_drop_geometry(wetlands[,c(3,5:7)])
  nodestats <- nodestats[which(nodestats$SiteName %in% sites$SiteName),] 
sites <- merge(nodestats, sites, by="SiteName")

Convert data frame to sf object.

sites <- st_as_sf(sites, coords = c("X", "Y"), 
                 crs = prj, agr = "constant") 
head(sites)

Question:

5. Saturated Graph {-}

a) Create graph from site locations {-}

To assess connectivity using a gravity model, we need to build a graph from the occupied frog sites create a graph. This could be any type of graph, but I generally use saturated (essentially a full genetic distance matrix) or pruned by some maximum distance.

dist.graph <- knn.graph(sites, row.names = sites$SiteName)
dist.graph <- merge(dist.graph, st_drop_geometry(sites), 
                      by.y="SiteName", by.x="from_ID")
dist.graph <- dist.graph[,-c(11:19)] # drop extra columns

Note: Can create a distance-constrained graph with max.dist arg (not run)

# dist.graph <- knn.graph(sites, row.names = sites$SiteName, max.dist=5000)

b) Merge the graph with genetic distance {-}

Read in the genetic distance data and make a matrix, gdist, then unfold data

gdist <- read.csv(system.file("extdata/RALU_Dps.csv", package="LandGenCourse"), header=TRUE) 
  rownames(gdist) <- t(names(gdist))
gdist <- dmatrix.df(as.matrix(gdist)) 
names(gdist) <- c("FROM", "TO", "GDIST") #unfold the file
gdist <- gdist[!gdist$FROM == gdist$TO ,]
gdist[,1] <-sub("X", "", gdist[,1])
gdist[,2] <-sub("X", "", gdist[,2])
gdist <- cbind(from.to=paste(gdist[,1], gdist[,2], sep="."), gdist)

Transform genetic distance to genetic flow (1-distance)

gdist$GDIST <- flow(gdist$GDIST)

Merge graph with genetic distances, based on from node - to node combination

dist.graph$from.to <- paste(dist.graph$i, dist.graph$j, sep=".")
dist.graph <- merge(dist.graph, gdist, by = "from.to")

Question:

6. Spatial model data prepration {-}

a) Read raster data using 'terra' {-}

xvars <- terra::rast(system.file("extdata/covariates.tif", package="GeNetIt"))

b) Reclassify wetlands {-}

NLCD is land cover data. The wetland classes are 11 and 90-95 (for this system and vintage NLCD data)

m <- c(0,10.8, 0,10.9,12.1,1,12.9,89.5,0, 89.1,95.1,1,95.9,100,0 )
  reclass <- matrix(m, ncol=3, byrow=TRUE)

wetlnd <- classify(xvars[["nlcd"]], reclass)
  names(wetlnd) <- "wetlnd"
    xvars <- c(xvars, wetlnd)

c) Calculate the proportion of the landscape around sites {-}

Assign proportion of landcover that is wetland to sites as pwetland. You could create a binary raster for any cover type of interest and calculate this parameter.

Question:

Create function to extract the proportion of wetland.

## method 1 (can result in Inf if all zero)
#  prop.land <- function(x) {
#   length(x[x==1]) / length(x)  
#  }

## method 2 (no divide by zero error)
prop.land <- function(x) {
prop.table(table(factor(x, levels=c(0,1))))[2]
}

Apply the function to extract the proportion of wetland within a buffer (here: 300 m).

b <- st_buffer(sites, 300)
pwetland <- extract(wetlnd, vect(b))
pwetland <- tapply(pwetland[,2], pwetland[,1], prop.land)

Add the proportion of wetland back to the dataframe.

sites$pwetland <- as.numeric(pwetland)
head(sites$pwetland)

Challenge:

Alternatively, you can use the landscapemetrics package (see Week 2) to calculate a broader set of landscape metrics. Here is an example for calculating pland.

Note: as per help file, the argument y of the function sample_lsm can accept an sf points object (e.g., sites). This returned an error though, hence we first convert sites to an sp object, using the function as_Spatial.

nlcd_sampled <- landscapemetrics::sample_lsm(landscape = xvars[["wetlnd"]], 
                                             what = "lsm_c_pland",
                                             shape = "circle",
                                             y = sf::st_coordinates(sites), 
                                             size = 300, 
                                             return_raster = FALSE,
                                             plot_id=sites$SiteID)
pwetland <- dplyr::select(dplyr::filter(nlcd_sampled, class == 1, 
                                        metric == "pland"), plot_id, value)  
names(pwetland) <- c("SiteID", "pwetland")
pwetland$pwetland <- pwetland$pwetland/100

head(pwetland)

Note: these values are sorted by SiteID and are not in the same order as the sf object sites, that's why the first six values may not be the same.

d) Add values of rasters to sample sites {-}

This adds potential "at site" variables, keep as sf POINT class object. We are removing raster 6 as knowing the cover class from NLCD that intersects the same points is not very useful information.

stats <- extract(xvars[[-6]], vect(sites))
sites <- st_sf(data.frame(as.data.frame(sites), stats), 
                 geometry=sites$geometry)

e) Add raster covariates to graph edges (lines) {-}

Remove nlcd and wetlnd rasters before calculating statistics, as calculating statistics on categorical data is nonsensical.

Calculating stats - this will take a while!

idx <- which(names(xvars) %in% c("nlcd","wetlnd"))
suppressWarnings(
  stats <- graph.statistics(dist.graph, r = xvars[[-idx]], 
                            buffer= NULL, stats = c("min",         
                            "mean","max", "var", "median")))

Add statistics to graph.

dist.graph <- st_sf(data.frame(as.data.frame(dist.graph), stats), 
                    geometry=dist.graph$geometry)

f) What about categorical variables? {-}

Statistical moments (mean, variance, etc.) are nonsensical for categorical variables. Here we create a function for returning the percent wetland between sites. Then we use it to calculate an additional statistic and add the result to the graph.

Question:

Function to calculate percent wetland between pairs of sites:

wet.pct <- function(x) {
  x <- ifelse(x == 11 | x == 12 | x == 90 | x == 95, 1, 0)
    prop.table(table(factor(x, levels=c(0,1))))[2] 
} 

Calculate statistic and add it to graph.

suppressWarnings(
  wetstats <- graph.statistics(dist.graph, r=xvars[["nlcd"]], 
                               buffer= NULL, stats = c("wet.pct")) )
dist.graph$wet.pct.nlcd <- as.numeric(wetstats[,1]) 

g) Evaluate node and edge correlations {-}

We need to evaluate correlations in the data to avoid overdispersion in our models (remember lab from Week 12, where we calculated the variance inflation factor, VIF).

Note, we are not going to actually remove the correlated variables but, just go through a few methods of evaluating them. The code to remove colinear variables is commented out for reference. We do have to log transform the data as to evaluate the actual model structure.

Create data frame with the node variables to be evaluated.

node.var <- c("degree", "betweenness", "Elev", "Length", "Area", "Perim", 
              "Depth", "pH","Dforest","Drock", "Dshrub", "pwetland", "cti",
              "dd5", "ffp","gsp","pratio","hli","rough27","srr")
s <- st_drop_geometry(sites) %>% select("degree", "betweenness", "Elev", "Length", "Area", "Perim", 
              "Depth", "pH","Dforest","Drock", "Dshrub", "pwetland", "cti", "ffp","gsp")

Log-transform values, set log of negative values or zero (where log is not defined) to 0.00001.

for(i in 1:ncol(s)) 
{
  s[,i] <- ifelse(s[,i] <= 0, 0.00001, log(s[,i]))
}

Site correlations:

p = 0.8 # Set upper limit of for collinearity
site.cor <- cor(s, y = NULL, 
                use = "complete.obs", 
                method = "pearson")
diag(site.cor) <- 0                 

cor.idx <- which(site.cor > p | site.cor < -p, arr.ind = TRUE)
cor.names <- vector()
cor.p <- vector()

for(i in 1:nrow(cor.idx)) 
{
     cor.p[i] <- site.cor[cor.idx[i,][1], cor.idx[i,][2]]
  cor.names [i] <- paste(rownames(site.cor)[cor.idx[i,][1]],
                         colnames(site.cor)[cor.idx[i,][2]], sep="_")
}   
data.frame(parm=cor.names, p=cor.p)

This returns a list of pairwise correlations that are than the threshold p. Instead of doing this manually, we can use the function collinear to check this. It will make a suggestions which variable to drop.

node.cor <- spatialEco::collinear(s, p=p)

It also returns a vector of correlated variables:

node.cor

h) Add node data {-}

Build and add node (at-site) level data to graph, then merge edge (distance graph) and edge (between-site) data.

node <- build.node.data(st_drop_geometry(sites), group.ids = "SiteID", 
                        from.parms = names(s))

Merge node and edges for model data.frame

gdata <- merge(st_drop_geometry(dist.graph)[c(1,2,5,11,14,7)], node, 
               by.x="SiteID", by.y="SiteID")
gdata <- merge(gdata, st_drop_geometry(dist.graph)[c(11, 8:10, 15:40)], 
                   by.x="SiteID", by.y="SiteID") 
# log transform matrix
for(i in 5:ncol(gdata)) 
{
  gdata[,i] <- ifelse(gdata[,i] <= 0, 0.00001, log(gdata[,i]))
}

7. Gravity model {-}

a) Develop hypothesis {-}

What type of gravity model do you wish to run (production or attraction constraint)? Think about what hypotheses that you want to test. Write out model statements that group parameters into hypotheses and run models. Remember to run a NULL that is just distance.

At-site (node) potential parameters. These are all at-site variables. Remember that we pulled all raster variables. We want to critically think about hypotheses and not use all of these parameters.

NOTE: we are adding elevation here as a covariate. HOWEVER - elevation does not represent ecological processes in and of itself. I strongly encourage using the components (temp, moisture, rainfall, vegetation, accessibility, etc.) directly and not elevation as a surrogate parameter.

Between site (edge) potential parameters include:

# null model (under Maximum Likelihood) 
( null <- gravity(y = "GDIST", x = c("length"), d = "length", group = "from_ID", 
                  data = gdata, fit.method = "ML") )

# Fish hypothesis (under Maximum Likelihood) 
( depth <- gravity(y = "GDIST", x = c("length","from.Depth"), d = "length", 
                   group = "from_ID", data = gdata, fit.method = "ML", ln = FALSE) )

# Productivity hypothesis (under Maximum Likelihood) 
#( production <- gravity(y = "GDIST", x = c("length", "from.ffp", "from.hli"), 
#                    d = "length",  group = "from_ID", data = gdata, 
#                    fit.method = "ML", ln = FALSE) )

# Climate hypothesis (under Maximum Likelihood) 
#( climate <- gravity(y = "GDIST", x = c("length", "median.ffp", "median.pratio"), 
#                     d = "length", group = "from_ID", data = gdata, 
#                    fit.method = "ML",  ln = FALSE) )

# Wetlands hypothesis (under Maximum Likelihood) 
( wetlands <- gravity(y = "GDIST", x = c("length", "from.degree", "from.betweenness", "from.pwetland"), 
                      d = "length", group = "from_ID", data = gdata, fit.method = "ML",
                      ln = FALSE) )

# Topography hypothesis (under Maximum Likelihood) 
#( topo <- gravity(y = "GDIST", x = c("length", "median.srr", "median.rough27"), d = "length", 
#                  group = "from_ID", data = gdata, fit.method = "ML",
#                 ln = FALSE) )

# Habitat hypothesis (under Maximum Likelihood) 
( habitat <- gravity(y = "GDIST", x = c("length", "wet.pct.nlcd", "median.gsp"), 
                     d = "length", group = "from_ID", data = gdata, fit.method = "ML",
                     ln = FALSE, method="ML") )

# Global model (under Maximum Likelihood) 

#( global <- gravity(y = "GDIST", x = c("length", "wet.pct.nlcd", "median.gsp", 
#                    "from.Depth", "from.ffp", "from.hli", "from.pratio", "from.degree", 
#                   "from.betweenness", "from.pwetland", "median.srr", "median.rough27"), 
#                   d = "length", group = "from_ID", data = gdata, fit.method = "ML",
#                   ln = FALSE) )
( global <- gravity(y = "GDIST", x = c("length", "wet.pct.nlcd", "median.gsp", 
                    "from.Depth", "from.ffp",  
                    "from.betweenness", "from.pwetland"), 
                    d = "length", group = "from_ID", data = gdata, fit.method = "ML",
                    ln = FALSE) )

b) Compare competing models {-}

Should you use ML or REML? Create diagnostic plots.

Can you directly compare ML and REML? Why not?

#compare.models(null, depth, production, climate, wetlands, 
#               topo, habitat, global)
compare.models(null, depth, wetlands, 
               habitat, global)
par(mfrow=c(2,3))
   for (i in 1:6) { plot(null, type=i) } 

c) Fit final model(s) and calculate effect size {-}

To calculate effect size, we refit the models with REML.

# productivity fit (under REML)
#h <- c("length", "from.ffp", "from.hli")
#production_fit <- gravity(y = "GDIST", x = h, d = "length", group = "from_ID",
#                      data = gdata, ln=FALSE)

# global fit (under REML)
#g <-  c("length", "wet.pct.nlcd", "median.gsp", "from.Depth", "from.ffp",
#        "from.hli", "from.pratio",  "from.degree", "from.betweenness",  
#        "from.pwetland", "median.srr",  "median.rough27")
g <-  c("length", "wet.pct.nlcd", "median.gsp", "from.Depth", "from.ffp",
        "from.betweenness",  "from.pwetland")
global_fit <- gravity(y = "GDIST", x = g, d = "length", 
                      group = "from_ID", data = gdata, ln=FALSE)

Effect size calculation: global model

gravity.es(global_fit)
par(mfrow=c(2,3))
   for (i in 1:6) { plot(global_fit, type=i) } 

Production model:

#gravity.es(production_fit)
#par(mfrow=c(2,3))
#   for (i in 1:6) { plot(production_fit, type=i) } 

d) Back predict global_fit model {-}

Feel free to add other top models.

gd <- back.transform(gdata$GDIST)

# Make individual-level (group) predictions (per slope) and show RMSE 
global.p <- predict(global_fit, y = "GDIST", x = g,  
                    newdata=gdata, groups = gdata$from_ID,
                    back.transform = "simple")
#production.p <- predict(production_fit, y = "GDIST", x = h,  
#                     newdata=gdata, groups = gdata$from_ID,
#                    back.transform = "simple")

cat("RMSE of global", rmse(global.p, gd), "\n")
#cat("RMSE of production", rmse(production.p, gd), "\n")

e) Aggregrate estimates and plot {-}

We can aggregrate estimates back to the edges and nodes. An interactive map can be created using the tmap package (see Week 2 Bonus vignette).

Aggregate estimates to graph and node.

global.p <- data.frame(EID = gdata$from.to,
                       NID = gdata$from_ID,  
                       p=global.p)

edge.p <- tapply(global.p$p, global.p$EID, mean)
  dist.graph$global.flow <- edge.p

node.p <- tapply(global.p$p, global.p$NID, mean)
node.var <- tapply(global.p$p, global.p$NID, var)

idx <- which(sites$SiteName %in% names(node.p))
sites$global.flow[idx] <- node.p
sites$global.var[idx] <- node.var

Define the map and store it in object Map.

pal <- colorRampPalette(rev(c("red","orange","blue")), bias=0.15)

Map <- 
  tm_shape(dist.graph) +
  tm_lines("global.flow", palette=pal(10), title.col="Edges: global.flow") +
  tm_shape(sites) +
  tm_symbols(col = "global.flow", size = "global.var", 
               shape = 20, scale = 0.75, palette=pal(10), 
             title.col="Nodes: global.flow", title.size="Nodes: global.var") 

Plot static map: here we tell R to plot the legend outside of the map.

tmap_mode(c("plot", "view")[1])
Map + tm_layout(legend.outside=TRUE, legend.position = c("right", "top"))

Plot interactive map: this may not render e.g. in Bookdown or when you knit your notebook to output formats other than .html.

#tmap_mode(c("plot", "view")[2])
#Map 


hhwagner1/LandGenCourse documentation built on Feb. 17, 2024, 4:42 p.m.