In hhwagner1/LandGenCourse: Interface for course "Landscape Genetic Data Analysis with R"

1. Overview of Worked Example {-}

a) Goals {-}

This worked example shows how to:

Calculate genetic distance at population level.
Perform a Mantel test to test for IBD.
Create Mantel correlograms for the genetic data.
Calculate and test Moran's I for population-level genetic diversity data.

b) Data set {-}

This is a larger version of the frog data set used in Weeks 1 - 3.

Here, we will analyze microsatellite data from Funk et al. (2005) and Murphy et al. (2010) for 418 individuals of Colombia spotted frogs (Rana luteiventris) from 30 populations, together with site-level spatial coordinates. Please see the introductory video on the DGS course website, R lab page.

ralu_loci_allpops.csv: Data frame with populations and genetic data (406 rows x 9 columns). Included as external file in package LandGenCourse.
Frogs_diversity_allpops.csv: Table with genetic diversity variables similar to Week 3.
ralu_coords_allpops.csv: Spatial coordinates (both in UTM and LongLat format), and basin.

c) Required R packages {-}

Install some packages (not on CRAN) that are needed for this worked example.

if(!requireNamespace("EcoGenetics", quietly = TRUE)) 
  remotes::install_github("leandroroser/EcoGenetics-devel")
if(!requireNamespace("GeNetIt", quietly = TRUE)) remotes::install_github("jeffreyevans/GeNetIt")
if(!requireNamespace("popgraph", quietly = TRUE))
{
  install.packages(c("RgoogleMaps", "geosphere", "proto", "sampling", 
                      "seqinr", "spacetime", "spdep"), dependencies=TRUE)
  remotes::install_github("dyerlab/popgraph")
}
if(!requireNamespace("gstudio", quietly = TRUE)) remotes::install_github("dyerlab/gstudio")

library(LandGenCourse)
library(GeNetIt)
library(dplyr)
library(EcoGenetics)
#require(adegenet)
#require(tibble)
#require(gstudio)
#require(hierfstat)
#require(PopGenReport)
#require(mmod)
#require(spdep)

2. Data import and manipulation {-}

a) Create an ecogen object {-}

Import the site data that we used before in Week 2.

Frogs.coords <- read.csv(system.file("extdata", "ralu_coords_allpops.csv", 
                                     package = "LandGenCourse"))

In Week 3, we calculated population-level genetic data as a table Frogs.diversity. Here we import a table with the same diversity measures for all 29 sites from a system file in LandGenCourse.

Frogs.diversity <- read.csv(system.file("extdata", "Frogs_diversity_allpops.csv", 
                                     package = "LandGenCourse"))

Import the genetic data for 29 sites:

Frogs.loci <- read.csv(system.file("extdata", "ralu_loci_allpops.csv", 
                                     package = "LandGenCourse"))

We start building an ecogen object by assigning the genetic data to the G slot, and the structure (i.e., hierarchical sampling information, here site names) to the S slot. For the genetic data, we need to specify the type of data and coding.

Note the tweak using data.frame when specifying data.frame(ralu.loci[,1:2]). This is necessary to import the multiple columns correctly and with their original names.

Frogs.ecogen <- ecogen(G = Frogs.loci[,-c(1:2)], ploidy = 2, 
                       type = "codominant", sep = ":",
                       S = data.frame(Frogs.loci[,1:2]))
Frogs.ecogen

The summary confirms that we now have data in the S and G slots. In addition, EcoGenetics created a table of absolute frequencies (i.e. counts) of alleles for each individual in slot A.

b) Aggregate to 'ecopop' {-}

Most of our analysis for this lab will be at the population level. The function ecogen2ecopop aggregates from individual to population-level data.

Frogs.ecopop <- ecogen2ecopop(Frogs.ecogen, hier = "SiteName")
Frogs.ecopop

Instead of 413 individuals, we now have data for 30 populations. Slot AF contains the population-level absolute frequencies (counts) of alleles.

b) Add site-level data {-}

Before importing the genetic diversity, spatial coordinates and site data into the ecopop object, we need to match the rows and extract the data for the sampled populations.

Unfortunately, the datasets use different versions of the site name, both were include in the file ralu_loci_allpops.csv and thus in the S slot of Frogs.ecogen.

Here, we use group_by and summarize to get the unique set of site names in both versions. We save it as a data frame Subset.
Then we join the genetic diversity data. Because the two data frames share a common variable, we dont' need to specify the argument by. R will confirm what ID variable it used to join.

Subset <- ecoslot.S(Frogs.ecogen) %>% group_by(SiteName, Pop) %>% summarize()
Subset <- left_join(Subset, Frogs.diversity)

R tells us that it used the shared ID variable Pop to join the data.

Now we can join the site data. Note: with as.data.frame, we combine the @data and @coords slots of the SpatialPointsDataFrame ralu.site to a single data frame.

Subset <- left_join(Subset, as.data.frame(Frogs.coords))

This time, R used the shared ID variable SiteName to join the data.

Now we have all site-level data in the data frame Subset that has the same row names as Frogs.ecopop. We check the names of the variables to decide which ones to put where. Then we assign them to their respective slots (@C for the custom data, here genetic diversity, @E for the environmental data, and @XY for the coordinates).

The argument pop specifies the matching ID variable in the @S slot. When we aggregated from Frogs.ecogen to Frogs.ecopop, unfortunately EcoGenetics renamed the variable SiteName to pop. The argument pop_levels identifies the corresponding ID variable in Subset.

names(Subset)
Frogs.ecopop <- EcoGenetics::eco.fill_ecogen_with_df(Frogs.ecopop,
             pop ="pop", pop_levels = Subset$SiteName, 
             C = Subset[,3:6],
             XY = Subset[,7:10])
Frogs.ecopop

c) Export to 'adegenet' and 'gstudio' {-}

'EcoGenetics' provides convenient functions for converting genetic data to and from other packages.

Import into 'genind' object (package 'adegenet'): there is a dedicated function, but we need to separately declare the variable that represents the populations and write it into the @pop slot of the 'genind' object.

Frogs.genind <- EcoGenetics::ecogen2genind(Frogs.ecogen)
Frogs.genind@pop <- ecoslot.S(Frogs.ecogen)$SiteName

For calculating population-level genetic distances, we aggregate the individual-level data to a genpop object with population-level allele frequencies.

Frogs.genpop <- adegenet::genind2genpop(Frogs.genind)
Frogs.genpop

Note: Alternatively, we could directly import the ecopop object into a genpop object (adegenet) with EcoGenetics::ecopop2genpop(Frogs.ecopop). However, this can create warnings later on when calculating genetic distances.

The object 'Frogs.genpop' has 30 rows, each representing a population.

We will also use some functions from the package gstudio, hence we import the individual-level genetic data into gstudio. This should be easy with the function EcoGenetics::ecogen2gstudio. However, there seems to be an issue. The following chunk contains code adapted from within the ecogen2gstudio function, tweaked to work for our data. (You can view the original function with: findMethods(ecogen2gstudio).)

#Frogs.gstudio <- EcoGenetics::ecogen2gstudio(Frogs.ecogen, type="codominant")

  dat <- eco.convert(Frogs.ecogen@G, "matrix", sep.in = ":", sep.out = ":")
  dat <- as.data.frame(dat, stringsAsFactors = FALSE)
  for (i in 1:ncol(dat)) {
     class(dat[, i]) <- "locus"
  }
  dat[is.na(dat)] <- gstudio::locus(NA)
  colnames(dat) <- colnames(Frogs.ecogen@G)

Frogs.gstudio <- data.frame(ecoslot.S(Frogs.ecogen), dat)
tibble::as_tibble(Frogs.gstudio)

3. Calculate genetic distances {-}

Here, we'll calculate a number of different measures of genetic distance, using functions from several packages. Adding the package name to each distance matrix name helps keeping track of methods used. Normally you would not calculate all of them for your own data, and some are redundant, as we will see.

Note: Some functions provide an option linearized = TRUE to linearize distances d by calculating d/(1-d). This should result in more linear relationships when plotted or correlated against geographic distance.Here we don't linearize, we can do so later manually.

a) Genetic distances calculated from genind object {-}

Pairwise Fst with package hierfstat:

GD.pop.PairwiseFst.hierfstat <- as.dist(hierfstat::pairwise.neifst(hierfstat::genind2hierfstat(Frogs.genind)))

Proportion of shared alleles with package 'PopGenReport':

GD.pop.propShared <- PopGenReport::pairwise.propShared(Frogs.genind)

Several distance matrices with package 'adegenet':

GD.pop.Nei <- adegenet::dist.genpop(Frogs.genpop, method=1)   
GD.pop.Edwards <- adegenet::dist.genpop(Frogs.genpop, method=2)
GD.pop.Reynolds <- adegenet::dist.genpop(Frogs.genpop, method=3)  
GD.pop.Rogers <- adegenet::dist.genpop(Frogs.genpop, method=4)  
GD.pop.Provesti <- adegenet::dist.genpop(Frogs.genpop, method=5)

Additional distance matrices with package 'mmod':

GD.pop.Joost <- mmod::pairwise_D(Frogs.genind, linearized = FALSE)
GD.pop.Hedrick <- mmod::pairwise_Gst_Hedrick(Frogs.genind, linearized = FALSE)
GD.pop.NeiGst <- mmod::pairwise_Gst_Nei(Frogs.genind, linearized = FALSE)

b) More distance matrices with 'gstudio' {-}

GD.pop.Euclidean.gstudio <-gstudio::genetic_distance(Frogs.gstudio, mode = "Euclidean", stratum="SiteName")
GD.pop.cGD.gstudio <-gstudio::genetic_distance(Frogs.gstudio, mode = "cGD", stratum="SiteName")
GD.pop.Nei.gstudio <-gstudio::genetic_distance(Frogs.gstudio, mode = "Nei", stratum="SiteName")
GD.pop.Dps.gstudio <-gstudio::genetic_distance(Frogs.gstudio, mode = "Dps", stratum="SiteName")
GD.pop.Jaccard.gstudio <-gstudio::genetic_distance(Frogs.gstudio, mode = "Jaccard", stratum="SiteName")

c) Assemble distance matrices {-}

We'll store the population-level genetic distances in a list 'GD.pop'.

Note: a few measures return similarities (scaled between 0 and 1) instead of distances. For instance, 'proporition of shared alleles' is 1 if the alleles are identical, and zero of no alleles are shared. We convert these values to distances by subtracting them from 1.

GD.pop <- list(pairwiseFst.hierfstat = GD.pop.PairwiseFst.hierfstat,
               propShared.PopGenReport = 1 - GD.pop.propShared,
               Nei.adegenet = GD.pop.Nei,
               Edwards.adegenet = GD.pop.Edwards,
               Reynolds.adegenet = GD.pop.Reynolds,
               Rogers.adegenet = GD.pop.Rogers,
               Provesti.adegenet = GD.pop.Provesti,
               Joost.mmod = GD.pop.Joost,
               Hedrick.mmod = GD.pop.Hedrick,
               Nei.mmod = GD.pop.NeiGst,
               Euclidean.gstudio = as.dist(GD.pop.Euclidean.gstudio),
               cGD.gstudio = as.dist(GD.pop.cGD.gstudio),
               Nei.gstudio = as.dist(GD.pop.Nei.gstudio),
               Dps.gstudio = as.dist(1 - GD.pop.Dps.gstudio),
               Jaccard.gstudio = as.dist(1 - GD.pop.Jaccard.gstudio))
round(cor(sapply(GD.pop, function(ls) as.vector(ls))),2)[,1:2]

Consider the correlations printed above (only the first two columns of the correlation matrix are shown).

Correlations are high in general, except for conditional genetic distance (cGD) (see Week 13).
There are some duplicate measures (with correlation = 1).

Note: the following functions calculate distance matrices at the individual level:

PopGenReport::gd.smouse()
adegenet::propShared()
gstudio::genetic_distance(mode = "AMOVA")

d) Export genetic distance matrices {-}

Optional: You can use save to save an R object to your file system, and load to read it in again. Note: the default setting is that save will overwrite existing files with the same name.

The code is commented out with #. To run it, remove the #. The first part creates a folder output in your project folder if it does not yet exist. The function save writes the list GD.pop into a file "GD.pop.RData", and the function load imports it again.

#require(here)
#if(!dir.exists(paste0(here::here(),"/output"))) dir.create(paste0(here::here(),"/output"))

#save(GD.pop, file = paste0(here::here(),"/output/GD.pop.RData"))
#load(paste0(here::here(),"/output/GD.pop.RData"))

4. Perform a Mantel test to test for IBD {-}

First, we calculate geographic (Euclidean) distances Dgeo with the dist function, using the spatial coordinates that we imported from ralu.site. These are UTM coordinates and thus metric.

Dgeo <- as.vector(dist(ecoslot.XY(Frogs.ecopop)[,1:2]))

a) Visually check linearity {-}

Before we quantify the linear relationship between genetic and geographic distances, let's check visually whether the relationship is indeed linear. To start, we will define genetic distance Dgen based on proportion of shared alleles.

par(mar=c(4,4,0,0))

Dgen <- as.vector(GD.pop$propShared.PopGenReport)
dens <- MASS::kde2d(Dgeo, Dgen, n=300)
myPal <- colorRampPalette(c("white","blue","gold","orange","red"))
plot(Dgeo, Dgen, pch=20, cex=0.5,  
    xlab="Geographic Distance", ylab="Genetic Distance")
image(dens, col=transp(myPal(300), 0.7), add=TRUE)
abline(lm(Dgen ~ Dgeo))
lines(loess.smooth(Dgeo, Dgen), col="red")

There is clearly an increase of genetic distance with geographic distance. However, the red line, which is a smoothed local mean, indicates that the relationship is not linear.

Let's take the natural logarithm of geographic distance:

par(mar=c(4,4,0,0))
dens <- MASS::kde2d(log(Dgeo), Dgen, n=300)
plot(log(Dgeo), Dgen, pch=20, cex=0.5,  
    xlab="Geographic Distance", ylab="Genetic Distance")
image(dens, col=transp(myPal(300), 0.7), add=TRUE)
abline(lm(Dgen ~ log(Dgeo)))
lines(loess.smooth(log(Dgeo), Dgen), col="red")

Questions:

Is the relationship between Dgen and Dgeo more linear than before the transformation?
What happened to the spread of points along the x axis (geographic distance)?
What do the units of the x-axis represent, after the transformation?

b) Perform Mantel test {-}

Next, we perform a Mantel test with the function mantel from the vegan package.

We define Dgen and Dgeo anew, as we need them in 'dist' format this time, not as vectors.

Dgen <- GD.pop$propShared.PopGenReport
Dgeo <- dist(ecoslot.XY(Frogs.ecopop)[,1:2])

IBD <- vegan::mantel(Dgen,Dgeo, method="pearson")
IBD

What happens if we take the log of geographic distance, which we saw above helps linearize the relationship for these data?

IBD <- vegan::mantel(Dgen,log(Dgeo), method="pearson")
IBD

The statistical significance (p-value) didn't really change, but the strength of the Mantel correlation increased from r = 0.62 to r = 0.68.

Instead of transforming variables, we could use Spearman rank correlation to quantify the strength of the association. Rank correlations can be used to quantify and test the strength of curvilinear relationships.

IBD <- vegan::mantel(Dgen,Dgeo, method="spearman")
IBD

For this measure of genetic diversity, the Mantel correlation was significant, quite strong, and the relationship not linear, hence the transformed data or the rank correlation performed better. What about the other measures?

Here we use lapply to apply the function mantel to each genetic distance matrix in GD.pop. Then we use sapply to extract two values for each distance matrix: statistic is the Mantel r statistic (here: Pearson linear correlation), and signif is the p-value. We can find these names with the function attributes (see above).

attributes(IBD)

Mantel.test <- lapply(GD.pop, function(x) vegan::mantel(x,Dgeo, method="pearson"))
data.frame(Mantel.r = sapply(Mantel.test, function(x) x$statistic),
           p.value = sapply(Mantel.test, function(x) x$signif))

The nature of the result did not depend on the measure of genetic diversity used. Let's repeat the analysis with log(Dgeo).

Mantel.test <- lapply(GD.pop, function(x) ade4::mantel.randtest(x,log(Dgeo)))
data.frame(Mantel.r = sapply(Mantel.test, function(x) x$obs),
           p.value = sapply(Mantel.test, function(x) x$pvalue))

Questions: - Did the transformation lead to higher Mantel correlations for all measures of genetic distance used here? - Why was it higher?

5. Create Mantel correlogram for genetic data {-}

Let's look at the relationship between genetic distance and geographic distance in a different way, with a Mantel correlogram. Note that this method does not make the assumption of linearity.

a) Create a first Mantel correlogram {-}

Here, we'll create a population-level Mantel correlogram with the proportion of shared alleles.

Note: The function eco.cormantel has an option latlon=TRUE that takes care of the distance calculation from lat-lon coordinates. To use this option, the coordinates must be in a matrix or data frame with the longitude in the first column and the latitude in the second. Here, we can set we can set latlon=FALSE because the spatial coordinates are in UTM projection.

The biological hypothesis of isolation-by-distance postulates that genetic distance increases with geographic distance. Hence it makes sense to use a one-sided alternative. Somewhat counter-intutitively, we use 'alternative="less"' to test for positive spatial autocorrelation.

corm <- EcoGenetics::eco.cormantel(M = GD.pop$propShared.PopGenReport, 
        XY = ecoslot.XY(Frogs.ecopop)[,1:2],  nsim = 199, latlon=FALSE, 
        alternative="less", method = "pearson")
corm

The table shows:

Breaks of distance lag 'd': here in meters (default for lag definition: Sturges rule)
Mean distance 'd.mean': mean distance of pairs in each lag.
obs: observed value of the statistic
Expected value 'exp': expected if there is no autocorrelation.
P-value 'p.val': default uses a two-sided permutation test with sequential Holm-Bonferroni adjustement of p-values.
Number of pairs 'cardinal': number of unique pairs per lag.

The result corm is an object of class eco.correlog (package: EcoGenetics). A safe way to access thet table is ecoslot.OUT(corm).

Let's plot the correlogram:

EcoGenetics::eco.plotCorrelog(corm)

You can hover over individual points of the correlogram to see the test statistic and mean distance.

b) Vary distance class definition {-}

Under IBD, at least the first distance lag should show positive spatial autocorrelation. Here, it is the first two classes, as indicated by the red symbols.

To what degree does this result depend on the following:

The distance lag definition?
The measure of genetic distance?
Non-linear relationship between genetic and geographic distances?

There are several options of the eco.cormantel function to modify the definition of distance classes:

int: distance interval in the units of XY
smin: minimum class distance in the units of XY
smax: maximum class distance in the units of XY
nclass: number of classes
seqvec: vector with breaks in the unites of XY
size: number of individuals per class
bin: rule for constructing intervals if no other parameters provided (default: Sturge's rule)

The easiest ones to modify are either nclass or size. Here we use size to specify that there should be at least 50 or 100 pairs in each distance class. (Note: for a reliable analysis, this should be 100 or more).

corm.50 <- EcoGenetics::eco.cormantel(M = GD.pop$propShared.PopGenReport, 
           XY = ecoslot.XY(Frogs.ecopop)[,1:2],  nsim = 199, latlon=FALSE,
           alternative="less", size=50)
EcoGenetics::ecoslot.OUT(corm.50)

corm.100 <- EcoGenetics::eco.cormantel(M = GD.pop$propShared.PopGenReport, 
           XY = ecoslot.XY(Frogs.ecopop)[,1:2],  nsim = 199, latlon=FALSE,
           alternative="less", size=100)
EcoGenetics::ecoslot.OUT(corm.100)

EcoGenetics::eco.plotCorrelog(corm.50)
EcoGenetics::eco.plotCorrelog(corm.100)

Let's compare the observed Mantel r statistic, p-value, number of pairs in the first distance class and their mean distance, as well as the definition of the first lag interval. We can get all of this by extracting the first line from each object.

The lag intervals are stored only in the row names, and we need to extract those separately and add them as a colum.

Lag1.def <- data.frame(rbind(Sturge = EcoGenetics::ecoslot.OUT(corm)[[1]][1,],
      size.50 = EcoGenetics::ecoslot.OUT(corm.50)[[1]][1,],
      size.100 = EcoGenetics::ecoslot.OUT(corm.100)[[1]][1,]))
Lag1.def$bin <- c(row.names(EcoGenetics::ecoslot.OUT(corm)[[1]])[1],
              row.names(EcoGenetics::ecoslot.OUT(corm.50)[[1]])[1],
              row.names(EcoGenetics::ecoslot.OUT(corm.100)[[1]])[1])
Lag1.def

All three distance class definitions (Sturge's rule, size = 50, size = 100) resulted in statistically significant p-values.
The Mantel correlation in the first distance lag was strongest for size.100.
For the first distance class, Sturge's rule resulted in 45 pairs and a mean distance similar to size.50 (for this specific dataset).

Overall, Sturge's rule to define distance classes seems to be a good compromise. What is the trade-off, i.e., what happens if distance lags are defined too narrowly or too widely?

It can be really helpful to plot the distribution of distances among the pairs and compare it to the distance intervals:

par(mfrow=c(3,1))

hist(Dgeo, nclass=50, main="Sturge's rule", axes=F, xlab="", ylab="")
for(i in 1:length(EcoGenetics::ecoslot.BREAKS(corm))){
  lines(rep(EcoGenetics::ecoslot.BREAKS(corm)[i], 2), c(0,50), col="blue")}

hist(Dgeo, nclass=50, main = "50 pairs per lag", axes=F)
for(i in 1:length(EcoGenetics::ecoslot.BREAKS(corm.50))){
  lines(rep(EcoGenetics::ecoslot.BREAKS(corm.50)[i], 2), c(0,50), col="blue")}

hist(Dgeo, nclass=50, main = "100 pairs per lag", axes=F)
for(i in 1:length(EcoGenetics::ecoslot.BREAKS(corm.100))){
  lines(rep(EcoGenetics::ecoslot.BREAKS(corm.100)[i], 2), c(0,50), col="blue")}

Question: Compare what happens at larger distances. Do you think Sturge's rule does a good job for these as well?

Unlike a Mantel test, where all pairs are considered, in geostatistics we typically interpret only values for distances up to a certain threshold, e.g. half the maximum distance, for two reasons:

There are few pairs in of these bins, making estimates highly variable.
Not all pairs contribute (equally) to the largest distance classes (those in the center of the study area are not involved in very large distances).

c) Alternative measures of genetic distances {-}

Which measure of genetic distance would provide the strongest Mantel correlation in the first distance class for this data set?

Here we will cycle through all genetic distance matrices in GD.pop and calculate a Mantel correlogram with Sturge's rule (not linearized, method="pearson"). This may take a while.

Note: the code that calculates corm.GD.pop is included here twice, first commented out, then with the option include=FALSE. This avoids printing out a lot of unnecessary output while still showing the (commented out) code in the .html version of the file.

#corm.GD.pop <- lapply(GD.pop, function(x) EcoGenetics::eco.cormantel(M = x, 
#                      XY = ecoslot.XY(Frogs.ecopop)[,1:2],  nsim = 199, latlon=FALSE,
#                      alternative="less"))

corm.GD.pop <- lapply(GD.pop, function(x) EcoGenetics::eco.cormantel(M = x, 
                      XY = ecoslot.XY(Frogs.ecopop)[,1:2],  nsim = 199, latlon=FALSE,
                      alternative="less"))

Next, we extract for each genetic distance matrix the observed value of the Mantel correlation for the first distance class and its p-value.

t(sapply(corm.GD.pop, function(x) EcoGenetics::ecoslot.OUT(x)[[1]][1,c(2,4)]))

Compare the p-values: for this dataset, all genetic distance measures resulted in significant spatial autocorrelation (indicating IBD).

Let's plot the Mantel correlogram for Nei.adegenet. Statistically significant lags are shown in a different color than non-significant ones.

EcoGenetics::eco.plotCorrelog(corm.GD.pop$Nei.adegenet)

Questions:

Starting with the first distance class, how many consecutive distance classes showed significant spatial autocorrelation?
At what distance (range), roughly, did the positive autocorrelation disappear?

6. Specify spatial weights and calculate Moran's I

In this part, we'll quantify and test Moran's I for the genetic diversity data as calculated in Week 3 lab.

Note: Above, we used a distance lag approach from geostatistics, here we use spatial neighbours and weights (neighbor topology). Either approach could be used with either type of data.

For a detailed tutorial on defining spatial neighbors and weights, see: https://cran.r-project.org/web/packages/adespatial/vignettes/tutorial.html#irregular-samplings

a) Defining spatial neighbors

The function chooseCN (package: adegenet) provides an interface for choosing a connection network, i.e., for defining spatial neighbors. The underlying functions are defined in package spdep (for defining spatial dependences). It can return the following graph types:

Delaunay triangulation (type 1)
Gabriel graph (type 2)
Relative neighbours (type 3)
Minimum spanning tree (type 4)
Neighbourhood by distance (type 5)
K nearests neighbours (type 6)
Inverse distances (type 7)

Here we use types 1 - 6 to define neighbors in different ways. Then we plot each graph in geographic space. Lines indicate pairs of sites classified as neighbors.

Note: this function expects metric spatial coordinates (e.g., UTM).

nb.del <- adegenet::chooseCN(xy = ecoslot.XY(Frogs.ecopop)[,1:2], 
          result.type = "nb", plot.nb = FALSE, type = 1)
nb.gab <- adegenet::chooseCN(xy = ecoslot.XY(Frogs.ecopop)[,1:2], 
          result.type = "nb", plot.nb = FALSE, type = 2)
nb.rel <- adegenet::chooseCN(xy = ecoslot.XY(Frogs.ecopop)[,1:2], 
          result.type = "nb", plot.nb = FALSE, type = 3)
nb.mst <- adegenet::chooseCN(xy = ecoslot.XY(Frogs.ecopop)[,1:2], 
          result.type = "nb", plot.nb = FALSE, type = 4)
nb.nbd <- adegenet::chooseCN(xy = ecoslot.XY(Frogs.ecopop)[,1:2], 
          result.type = "nb", plot.nb = FALSE, type = 5, d1=100, d2=15000)
nb.4nn <- adegenet::chooseCN(xy = ecoslot.XY(Frogs.ecopop)[,1:2], 
          result.type = "nb", plot.nb = FALSE, type = 6, k = 4)

par(mfrow=c(2,3), mai=c(0.1,0.1,0.1, 0.1))
plot(nb.del, coords=ecoslot.XY(Frogs.ecopop)); title(main="Delaunay")
plot(nb.gab, coords=ecoslot.XY(Frogs.ecopop)); title(main="Gabriel")
plot(nb.rel, coords=ecoslot.XY(Frogs.ecopop)); title(main= "Rel. neighbors")
plot(nb.mst, coords=ecoslot.XY(Frogs.ecopop)); title(main= "Min spanning tree")
plot(nb.nbd, coords=ecoslot.XY(Frogs.ecopop)); title(main = "Neighbor distance")
plot(nb.4nn, coords=ecoslot.XY(Frogs.ecopop)); title(main = "4 nearest neighbors")
par(mfrow=c(1,1))

For spatial statistics, spatial neighbors are used to calculate a local mean. We want each site to have multiple neighbors, but they should be nearby only. Gabriel graph (type = 2) is often a good option, and we'll use it for the rest of this worked example.

b) Defining spatial weights

By default, chooseCN returns row-standardized weights, so that for each site, the weights of its neighbors sum to 1. This means that a local mean can be calculated as a weighted mean of the other sites (non-neighboring sites have a weight of 0).

With the function nb2mat we can convert the neighbor object to a matrix of spatial weights. Let's look at the first five lines and columns:

spdep::nb2mat(nb.gab)[1:5,1:5]

Each row contains the weights for the neighbors of one site. We see that the third site is a neighbor of the second site and vice versa. However, the weights are not the same. It seems that site 2 has six neighbors, so each has a weight of 0.167, whereas site 3 has four neighbors, each with a weight of 0.25.

c) Calculating and testing Moran's I

Let's see whether rarefied allelic richness, Ar, shows spatial autocorrelation among these sites.

spdep::moran.test(ecoslot.C(Frogs.ecopop)$Ar, spdep::nb2listw(nb.gab),
                  alternative="greater")

The test statistic is 0.61, and the p-value for a one-sided alternative "greater" (i.e., positive spatial autocorrelation) is p < 0.0001. Thus, Ar showed strong and statistially significant spatial autocorrelation.

Let's do this for all genetic diversity variables and extract the value of the Moran I statistics (for first neighbors) and its p-value.

Frogs.moran <- lapply(ecoslot.C(Frogs.ecopop), 
                       function(x) spdep::moran.test(x,
                       spdep::nb2listw(nb.gab), alternative="two.sided"))
round(data.frame(obs = sapply(Frogs.moran, function(x) as.vector(x$estimate[1])),  
p.value = sapply(Frogs.moran, function(x) x$p.value)),3)

Questions:

Which measure of genetic diversity showed the strongest autocorrelation?
Sample size n had a negative value of Moran's I (obs). What does this mean?
Did all four variables show statistically significant spatial autocorrelation?

7. R Exercise Week 5

Task: Assess fine-scale spatial genetic structure in Pulsatilla vulgaris within a single patch, A25: Test for IBD with Mantel rank correlation, and use a Mantel correlogram to assess the range of spatial autocorrelation.

Hints:

a) Load packages: You may want to load the packages dplyr, EcoGenetics and adegenet. Alternatively, you can use :: to call functions from packages.

b) Import data, extract adults from A25.

- Use the code below to import the data into gstudio. 
- Inspect a few rows of the data.
- Filter the dataset to retain adults (OffID == 0) from site A25 (Population == "A25").

Pulsatilla.gstudio <- gstudio::read_population(path=system.file("extdata",
                            "pulsatilla_genotypes.csv", 
                            package = "LandGenCourse"), 
                    type="column", locus.columns=c(6:19), 
                    phased=FALSE, sep=",", header=TRUE)

c) Plot locations of individuals: use plot with the argument asp = 1 to plot the locations of the sampled individuals from site A25 to scale.

d) Convert to ecogen and genind: use EcoGenetics::gstudio2ecogen to convert to an ecogen object. From there, use EcoGenetics::ecogen2genind to convert to a genind object.

e) Calculate genetic and Euclidean distance:

- Use `adegenet::propShared` to calculate the proportion of shared alleles at the individual level. (Do not convert to genpop). 
- Convert to a distance measure by calculating `1 - propShared`.
- Check object class. If it is not `dist`, use `as.dist` to convert to a `dist` object.
- Use `dist` to calculate Euclidean distance between individuals.

f) Mantel test:

- Adapt code from section 4.a to plot pairwise genetic distance against Euclidean distance. 
- Do you notice something unusial in the plot? Why are there so few different values of genetic distance?
- Do you think there is spatial autocorrelation? If so, up to what distance?
- Adapt code from section 4.b to test for IBD with a Mantel test, using Spearman rank correlation.

g) Mantel correlogram: adapt code from section 5.a to create and plot a Mantel correlogram. Interpret the results.

Questions: What is the range of spatial autocorrelation in P. vulgaris in site A25?

Based on a plot of genetic distance against Euclidean distance?
Based on where the Mantel correlogram reaches 0?
Based on statistical significance tests for the Mantel correlogram (with default settings: one-sided alternative "less", Holm's adjustment)?

LandGenCourse::detachAllPackages()

hhwagner1/LandGenCourse documentation built on Feb. 17, 2024, 4:42 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

hhwagner1/LandGenCourse
Interface for course "Landscape Genetic Data Analysis with R"

In hhwagner1/LandGenCourse: Interface for course "Landscape Genetic Data Analysis with R"

1. Overview of Worked Example {-}

a) Goals {-}

b) Data set {-}

c) Required R packages {-}

2. Data import and manipulation {-}

a) Create an ecogen object {-}

b) Aggregate to 'ecopop' {-}

b) Add site-level data {-}

c) Export to 'adegenet' and 'gstudio' {-}

3. Calculate genetic distances {-}

a) Genetic distances calculated from genind object {-}

b) More distance matrices with 'gstudio' {-}

c) Assemble distance matrices {-}

d) Export genetic distance matrices {-}

4. Perform a Mantel test to test for IBD {-}

a) Visually check linearity {-}

b) Perform Mantel test {-}

5. Create Mantel correlogram for genetic data {-}

a) Create a first Mantel correlogram {-}

b) Vary distance class definition {-}

c) Alternative measures of genetic distances {-}

6. Specify spatial weights and calculate Moran's I

a) Defining spatial neighbors

b) Defining spatial weights

c) Calculating and testing Moran's I

7. R Exercise Week 5

R Package Documentation

Browse R Packages

We want your feedback!

hhwagner1/LandGenCourse Interface for course "Landscape Genetic Data Analysis with R"

In hhwagner1/LandGenCourse: Interface for course "Landscape Genetic Data Analysis with R"

1. Overview of Worked Example {-}

a) Goals {-}

b) Data set {-}

c) Required R packages {-}

2. Data import and manipulation {-}

a) Create an ecogen object {-}

b) Aggregate to 'ecopop' {-}

b) Add site-level data {-}

c) Export to 'adegenet' and 'gstudio' {-}

3. Calculate genetic distances {-}

a) Genetic distances calculated from genind object {-}

b) More distance matrices with 'gstudio' {-}

c) Assemble distance matrices {-}

d) Export genetic distance matrices {-}

4. Perform a Mantel test to test for IBD {-}

a) Visually check linearity {-}

b) Perform Mantel test {-}

5. Create Mantel correlogram for genetic data {-}

a) Create a first Mantel correlogram {-}

b) Vary distance class definition {-}

c) Alternative measures of genetic distances {-}

6. Specify spatial weights and calculate Moran's I

a) Defining spatial neighbors

b) Defining spatial weights

c) Calculating and testing Moran's I

7. R Exercise Week 5

R Package Documentation

Browse R Packages

We want your feedback!

hhwagner1/LandGenCourse
Interface for course "Landscape Genetic Data Analysis with R"