There are many ways graphs can be implemented to understand population structure and relate that structure to landscape characteristics (see Dyer and Nason 2004). In this exercise, we will calculate various graph metrics and apply graphs to fit a gravity model.
Gravity models are a type of inferential model that exploit graph characteristics. Gravity models include both at-site and between-site landscape data. They are a type of graph consisting of nodes and edges. These nodes and edges represent landscape characteristics associated with these graph elements.
In this exercise, you will use the gravity model framework to build an empirical model of gene flow for the Columbia spotted frog dataset in central Idaho that you have used for several other exercises (Murphy et al. 2010).
This code checks of all required packages are installed, and if so, loads them. If any are missing, it will return a message that identifies the packages that need to be installed. If that happens, install the packages and run the code again.
The package then checks if your version of the GeNetIt
package is up-to-date (at least 0.1-5), and if not, installs the latest version.
p <- c("raster", "igraph", "sp", "GeNetIt", "spatialEco", "leaflet", "sf", "terra", "sfnetworks", "spdep", "dplyr", "tmap", "devtools") if(any(!unlist(lapply(p, requireNamespace, quietly=TRUE)))) { m = which(!unlist(lapply(p, requireNamespace, quietly=TRUE))) suppressMessages(invisible(lapply(p[-m], require, character.only=TRUE))) stop("Missing library, please install ", paste(p[m], collapse = " ")) } else { if(packageVersion("GeNetIt") < "0.1-5") { remotes::install_github("jeffreyevans/GeNetIt") } suppressMessages(invisible(lapply(p, require, character.only=TRUE))) }
# Get the path to your project folder Path <- here::here("downloads") # indicates UTM 11 NAD83 projection prj = 32611 # Some needed functions back.transform <- function(y) exp(y + 0.5 * stats::var(y)) rmse = function(p, o){ sqrt(mean((p - o)^2)) }
In this sections, we will read in all wetland locations in the study area and calculate a few graph-based metrics to assign to wetland sites that data was collected at. This allows us to put our samples into the context of the larger wetland system thus, accounting for proximity and juxtaposition.
Import file "Wetlands.csv".
wetlands <- read.csv(system.file("extdata/Wetlands.csv", package="LandGenCourse"), header = TRUE) head(wetlands)
Make it a spatial object.
wetlands <- st_as_sf(wetlands, coords = c("X", "Y"), crs = 32611, agr = "constant") str(wetlands)
Create Gabriel graph from the wetlands to represent a "realization" of connectivity and spatial arrangement.
Derive Gabriel graph
gg <- graph2nb(gabrielneigh(st_coordinates(wetlands)),sym=TRUE) plot(gg, coords=st_coordinates(wetlands))
Questions:
To calculate graph metrics, we need to do a few steps:
Coerce to sf line object (will be used to create igraph object)
gg <- nb2lines(gg, coords = sf::st_coordinates(wetlands), proj4string = prj, as_sf=TRUE)
Coerce to a sfnetwork
, which is an igraph
object:
wg <- as_sfnetwork(gg, edges=gg, nodes=wetlands, directed = FALSE, node_key = "SiteName", length_as_weight = TRUE, edges_as_lines = TRUE)
Calculate weights
w <- wg %>% activate("edges") %>% pull(weight) %>% as.numeric() w[w <= 0] <- 1 w = w / sum(w)
Calculate graph metrics of betweenness and closeness with weights and degree. We'll add these as attributes (columns) to the wetlands sf
object.
wetlands$betweenness <- igraph::betweenness(wg, directed=FALSE, weights=w) wetlands$degree <- igraph::degree(wg) wetlands$closeness <- igraph::closeness(wg, weights=w) wetlands
Plot results using the graph edges and wetlands points with the attributes "betweenness", "closeness" and "degree".
Plot betweenness
plot(st_geometry(gg), col="grey") plot(wetlands["betweenness"], pch=19, cex=0.75, add=TRUE) box() title("Wetlands Gabriel graph betweenness")
Plot closeness
plot(st_geometry(gg), col="grey") plot(wetlands["closeness"], pch=19, cex=0.75, add=TRUE) box() title("Wetlands Gabriel graph closeness")
Plot degree
plot(st_geometry(gg), col="grey") plot(wetlands["degree"], pch=19, cex=0.75, add=TRUE) box() title("Wetlands Gabriel graph degree")
Questions: Consider the three figures above and how the graph was constructed.
In this section we will read the field data and add the node metrics we just calculated.
Using RALU_Site.csv
, we read in the data, add the node data (betweenness and degree), create a spatial object that includes the node data.
Read in site data
sites <- read.csv(system.file("extdata/RALU_Site.csv", package="LandGenCourse"), header = TRUE) sites$SiteID <- as.character(sites$SiteID)
Add the node data.
Note: using names is dangerous as, small changes in names can result in non-matches. In this case, the ID fields are not consistent (data were collected at different times for different purposes originally). However, names are standardized in a drop-down list of a database. So they are a matching field. My recommendation to you all is to do to this type of operation on a numeric field.
nodestats <- st_drop_geometry(wetlands[,c(3,5:7)]) nodestats <- nodestats[which(nodestats$SiteName %in% sites$SiteName),] sites <- merge(nodestats, sites, by="SiteName")
Convert data frame to sf
object.
sites <- st_as_sf(sites, coords = c("X", "Y"), crs = prj, agr = "constant") head(sites)
Question:
sites
? To assess connectivity using a gravity model, we need to build a graph from the occupied frog sites create a graph. This could be any type of graph, but I generally use saturated (essentially a full genetic distance matrix) or pruned by some maximum distance.
dist.graph <- knn.graph(sites, row.names = sites$SiteName) dist.graph <- merge(dist.graph, st_drop_geometry(sites), by.y="SiteName", by.x="from_ID") dist.graph <- dist.graph[,-c(11:19)] # drop extra columns
Note: Can create a distance-constrained graph with max.dist arg (not run)
# dist.graph <- knn.graph(sites, row.names = sites$SiteName, max.dist=5000)
Read in the genetic distance data and make a matrix, gdist, then unfold data
gdist <- read.csv(system.file("extdata/RALU_Dps.csv", package="LandGenCourse"), header=TRUE) rownames(gdist) <- t(names(gdist)) gdist <- dmatrix.df(as.matrix(gdist)) names(gdist) <- c("FROM", "TO", "GDIST") #unfold the file gdist <- gdist[!gdist$FROM == gdist$TO ,] gdist[,1] <-sub("X", "", gdist[,1]) gdist[,2] <-sub("X", "", gdist[,2]) gdist <- cbind(from.to=paste(gdist[,1], gdist[,2], sep="."), gdist)
Transform genetic distance to genetic flow (1-distance)
gdist$GDIST <- flow(gdist$GDIST)
Merge graph with genetic distances, based on from node - to node combination
dist.graph$from.to <- paste(dist.graph$i, dist.graph$j, sep=".") dist.graph <- merge(dist.graph, gdist, by = "from.to")
Question:
xvars <- terra::rast(system.file("extdata/covariates.tif", package="GeNetIt"))
NLCD is land cover data. The wetland classes are 11 and 90-95 (for this system and vintage NLCD data)
m <- c(0,10.8, 0,10.9,12.1,1,12.9,89.5,0, 89.1,95.1,1,95.9,100,0 ) reclass <- matrix(m, ncol=3, byrow=TRUE) wetlnd <- classify(xvars[["nlcd"]], reclass) names(wetlnd) <- "wetlnd" xvars <- c(xvars, wetlnd)
Assign proportion of landcover that is wetland to sites as pwetland
. You could create a binary raster for any cover type of interest and calculate this parameter.
Question:
Create function to extract the proportion of wetland.
## method 1 (can result in Inf if all zero) # prop.land <- function(x) { # length(x[x==1]) / length(x) # } ## method 2 (no divide by zero error) prop.land <- function(x) { prop.table(table(factor(x, levels=c(0,1))))[2] }
Apply the function to extract the proportion of wetland within a buffer (here: 300 m).
b <- st_buffer(sites, 300) pwetland <- extract(wetlnd, vect(b)) pwetland <- tapply(pwetland[,2], pwetland[,1], prop.land)
Add the proportion of wetland back to the dataframe.
sites$pwetland <- as.numeric(pwetland) head(sites$pwetland)
Challenge:
Alternatively, you can use the landscapemetrics
package (see Week 2) to calculate a broader set of landscape metrics. Here is an example for calculating pland
.
Note: as per help file, the argument y
of the function sample_lsm
can accept an sf
points object (e.g., sites
). This returned an error though, hence we first convert sites
to an sp
object, using the function as_Spatial
.
nlcd_sampled <- landscapemetrics::sample_lsm(landscape = xvars[["wetlnd"]], what = "lsm_c_pland", shape = "circle", y = sf::st_coordinates(sites), size = 300, return_raster = FALSE, plot_id=sites$SiteID) pwetland <- dplyr::select(dplyr::filter(nlcd_sampled, class == 1, metric == "pland"), plot_id, value) names(pwetland) <- c("SiteID", "pwetland") pwetland$pwetland <- pwetland$pwetland/100 head(pwetland)
Note: these values are sorted by SiteID
and are not in the same order as the sf
object sites
, that's why the first six values may not be the same.
This adds potential "at site" variables, keep as sf POINT class object. We are removing raster 6 as knowing the cover class from NLCD that intersects the same points is not very useful information.
stats <- extract(xvars[[-6]], vect(sites)) sites <- st_sf(data.frame(as.data.frame(sites), stats), geometry=sites$geometry)
Remove nlcd
and wetlnd
rasters before calculating statistics, as calculating statistics on categorical data is nonsensical.
Calculating stats - this will take a while!
idx <- which(names(xvars) %in% c("nlcd","wetlnd")) suppressWarnings( stats <- graph.statistics(dist.graph, r = xvars[[-idx]], buffer= NULL, stats = c("min", "mean","max", "var", "median")))
Add statistics to graph.
dist.graph <- st_sf(data.frame(as.data.frame(dist.graph), stats), geometry=dist.graph$geometry)
Statistical moments (mean, variance, etc.) are nonsensical for categorical variables. Here we create a function for returning the percent wetland between sites. Then we use it to calculate an additional statistic and add the result to the graph.
Question:
Function to calculate percent wetland between pairs of sites:
wet.pct <- function(x) { x <- ifelse(x == 11 | x == 12 | x == 90 | x == 95, 1, 0) prop.table(table(factor(x, levels=c(0,1))))[2] }
Calculate statistic and add it to graph.
suppressWarnings( wetstats <- graph.statistics(dist.graph, r=xvars[["nlcd"]], buffer= NULL, stats = c("wet.pct")) ) dist.graph$wet.pct.nlcd <- as.numeric(wetstats[,1])
We need to evaluate correlations in the data to avoid overdispersion in our models (remember lab from Week 12, where we calculated the variance inflation factor, VIF).
Note, we are not going to actually remove the correlated variables but, just go through a few methods of evaluating them. The code to remove colinear variables is commented out for reference. We do have to log transform the data as to evaluate the actual model structure.
Create data frame with the node variables to be evaluated.
node.var <- c("degree", "betweenness", "Elev", "Length", "Area", "Perim", "Depth", "pH","Dforest","Drock", "Dshrub", "pwetland", "cti", "dd5", "ffp","gsp","pratio","hli","rough27","srr") s <- st_drop_geometry(sites) %>% select("degree", "betweenness", "Elev", "Length", "Area", "Perim", "Depth", "pH","Dforest","Drock", "Dshrub", "pwetland", "cti", "ffp","gsp")
Log-transform values, set log of negative values or zero (where log is not defined) to 0.00001.
for(i in 1:ncol(s)) { s[,i] <- ifelse(s[,i] <= 0, 0.00001, log(s[,i])) }
Site correlations:
p = 0.8 # Set upper limit of for collinearity site.cor <- cor(s, y = NULL, use = "complete.obs", method = "pearson") diag(site.cor) <- 0 cor.idx <- which(site.cor > p | site.cor < -p, arr.ind = TRUE) cor.names <- vector() cor.p <- vector() for(i in 1:nrow(cor.idx)) { cor.p[i] <- site.cor[cor.idx[i,][1], cor.idx[i,][2]] cor.names [i] <- paste(rownames(site.cor)[cor.idx[i,][1]], colnames(site.cor)[cor.idx[i,][2]], sep="_") } data.frame(parm=cor.names, p=cor.p)
This returns a list of pairwise correlations that are than the threshold p. Instead of doing this manually, we can use the function collinear to check this. It will make a suggestions which variable to drop.
node.cor <- spatialEco::collinear(s, p=p)
It also returns a vector of correlated variables:
node.cor
Build and add node (at-site) level data to graph, then merge edge (distance graph) and edge (between-site) data.
node <- build.node.data(st_drop_geometry(sites), group.ids = "SiteID", from.parms = names(s))
Merge node and edges for model data.frame
gdata <- merge(st_drop_geometry(dist.graph)[c(1,2,5,11,14,7)], node, by.x="SiteID", by.y="SiteID") gdata <- merge(gdata, st_drop_geometry(dist.graph)[c(11, 8:10, 15:40)], by.x="SiteID", by.y="SiteID") # log transform matrix for(i in 5:ncol(gdata)) { gdata[,i] <- ifelse(gdata[,i] <= 0, 0.00001, log(gdata[,i])) }
What type of gravity model do you wish to run (production or attraction constraint)? Think about what hypotheses that you want to test. Write out model statements that group parameters into hypotheses and run models. Remember to run a NULL that is just distance.
At-site (node) potential parameters. These are all at-site variables. Remember that we pulled all raster variables. We want to critically think about hypotheses and not use all of these parameters.
degree
- graph degreebetweenness
- graph betweenessElev
- elevation (see comments below)Length
- geographic distanceArea
- wetland area (field)Perim
- wetland perimeter (field)Depth
- wetland depth (field)- highly correlated with predatory fish presence/abundancepH
- wetland pH (field)Dforest
- distance to forest (field)Drock
- distance to rock (field)Dshrub
- distance to shrub (field)pwetland
- proportion of wetland in X buffer (calculated above)cti
- compound topographic wetness index - steady-state measure of wetness based on topography (raster data)dd5
- degree days >5 C (sum of temp) - (raster data)ffp
- frost free period (raster data)gsp
- growing season precipitation (raster data)pratio
- ratio of growing season precip to annual precip (raster data) - can indicate amount of snow to rainhli
- heat load index - topographic measure of exposure, related to productivity (ice-off and primary productivity) in this system (raster data)rough27
- unscale topographic variation at a 27 X 27 (cells) window size (raster data)ssr
- measure of topographic variation at a 27X27 (cells) windo size - for this system pulling out ridgelines (raster data)NOTE: we are adding elevation here as a covariate. HOWEVER - elevation does not represent ecological processes in and of itself. I strongly encourage using the components (temp, moisture, rainfall, vegetation, accessibility, etc.) directly and not elevation as a surrogate parameter.
Between site (edge) potential parameters include:
cti
, dd5
, ffp
, gsp
, pratio
, hli
, rough27
, ssr
(min, mean, max, var, median for each)wet.pct.nlcd
- percent of cells that are wetland class# null model (under Maximum Likelihood) ( null <- gravity(y = "GDIST", x = c("length"), d = "length", group = "from_ID", data = gdata, fit.method = "ML") ) # Fish hypothesis (under Maximum Likelihood) ( depth <- gravity(y = "GDIST", x = c("length","from.Depth"), d = "length", group = "from_ID", data = gdata, fit.method = "ML", ln = FALSE) ) # Productivity hypothesis (under Maximum Likelihood) #( production <- gravity(y = "GDIST", x = c("length", "from.ffp", "from.hli"), # d = "length", group = "from_ID", data = gdata, # fit.method = "ML", ln = FALSE) ) # Climate hypothesis (under Maximum Likelihood) #( climate <- gravity(y = "GDIST", x = c("length", "median.ffp", "median.pratio"), # d = "length", group = "from_ID", data = gdata, # fit.method = "ML", ln = FALSE) ) # Wetlands hypothesis (under Maximum Likelihood) ( wetlands <- gravity(y = "GDIST", x = c("length", "from.degree", "from.betweenness", "from.pwetland"), d = "length", group = "from_ID", data = gdata, fit.method = "ML", ln = FALSE) ) # Topography hypothesis (under Maximum Likelihood) #( topo <- gravity(y = "GDIST", x = c("length", "median.srr", "median.rough27"), d = "length", # group = "from_ID", data = gdata, fit.method = "ML", # ln = FALSE) ) # Habitat hypothesis (under Maximum Likelihood) ( habitat <- gravity(y = "GDIST", x = c("length", "wet.pct.nlcd", "median.gsp"), d = "length", group = "from_ID", data = gdata, fit.method = "ML", ln = FALSE, method="ML") ) # Global model (under Maximum Likelihood) #( global <- gravity(y = "GDIST", x = c("length", "wet.pct.nlcd", "median.gsp", # "from.Depth", "from.ffp", "from.hli", "from.pratio", "from.degree", # "from.betweenness", "from.pwetland", "median.srr", "median.rough27"), # d = "length", group = "from_ID", data = gdata, fit.method = "ML", # ln = FALSE) ) ( global <- gravity(y = "GDIST", x = c("length", "wet.pct.nlcd", "median.gsp", "from.Depth", "from.ffp", "from.betweenness", "from.pwetland"), d = "length", group = "from_ID", data = gdata, fit.method = "ML", ln = FALSE) )
Should you use ML or REML? Create diagnostic plots.
Can you directly compare ML and REML? Why not?
#compare.models(null, depth, production, climate, wetlands, # topo, habitat, global) compare.models(null, depth, wetlands, habitat, global)
par(mfrow=c(2,3)) for (i in 1:6) { plot(null, type=i) }
To calculate effect size, we refit the models with REML.
# productivity fit (under REML) #h <- c("length", "from.ffp", "from.hli") #production_fit <- gravity(y = "GDIST", x = h, d = "length", group = "from_ID", # data = gdata, ln=FALSE) # global fit (under REML) #g <- c("length", "wet.pct.nlcd", "median.gsp", "from.Depth", "from.ffp", # "from.hli", "from.pratio", "from.degree", "from.betweenness", # "from.pwetland", "median.srr", "median.rough27") g <- c("length", "wet.pct.nlcd", "median.gsp", "from.Depth", "from.ffp", "from.betweenness", "from.pwetland") global_fit <- gravity(y = "GDIST", x = g, d = "length", group = "from_ID", data = gdata, ln=FALSE)
Effect size calculation: global model
gravity.es(global_fit) par(mfrow=c(2,3)) for (i in 1:6) { plot(global_fit, type=i) }
Production model:
#gravity.es(production_fit) #par(mfrow=c(2,3)) # for (i in 1:6) { plot(production_fit, type=i) }
Feel free to add other top models.
gd <- back.transform(gdata$GDIST) # Make individual-level (group) predictions (per slope) and show RMSE global.p <- predict(global_fit, y = "GDIST", x = g, newdata=gdata, groups = gdata$from_ID, back.transform = "simple") #production.p <- predict(production_fit, y = "GDIST", x = h, # newdata=gdata, groups = gdata$from_ID, # back.transform = "simple") cat("RMSE of global", rmse(global.p, gd), "\n") #cat("RMSE of production", rmse(production.p, gd), "\n")
We can aggregrate estimates back to the edges and nodes. An interactive map can be created using the tmap
package (see Week 2 Bonus vignette).
Aggregate estimates to graph and node.
global.p <- data.frame(EID = gdata$from.to, NID = gdata$from_ID, p=global.p) edge.p <- tapply(global.p$p, global.p$EID, mean) dist.graph$global.flow <- edge.p node.p <- tapply(global.p$p, global.p$NID, mean) node.var <- tapply(global.p$p, global.p$NID, var) idx <- which(sites$SiteName %in% names(node.p)) sites$global.flow[idx] <- node.p sites$global.var[idx] <- node.var
Define the map and store it in object Map
.
pal <- colorRampPalette(rev(c("red","orange","blue")), bias=0.15) Map <- tm_shape(dist.graph) + tm_lines("global.flow", palette=pal(10), title.col="Edges: global.flow") + tm_shape(sites) + tm_symbols(col = "global.flow", size = "global.var", shape = 20, scale = 0.75, palette=pal(10), title.col="Nodes: global.flow", title.size="Nodes: global.var")
Plot static map: here we tell R to plot the legend outside of the map.
tmap_mode(c("plot", "view")[1]) Map + tm_layout(legend.outside=TRUE, legend.position = c("right", "top"))
Plot interactive map: this may not render e.g. in Bookdown or when you knit your notebook to output formats other than .html
.
#tmap_mode(c("plot", "view")[2]) #Map
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.