This worked example shows how to:
This is a larger version of the frog data set used in Weeks 1 - 3.
Here, we will analyze microsatellite data from Funk et al. (2005) and Murphy et al. (2010) for 418 individuals of Colombia spotted frogs (Rana luteiventris) from 30 populations, together with site-level spatial coordinates. Please see the introductory video on the DGS course website, R lab page.
LandGenCourse
.Install some packages (not on CRAN) that are needed for this worked example.
if(!requireNamespace("EcoGenetics", quietly = TRUE)) remotes::install_github("leandroroser/EcoGenetics-devel") if(!requireNamespace("GeNetIt", quietly = TRUE)) remotes::install_github("jeffreyevans/GeNetIt") if(!requireNamespace("popgraph", quietly = TRUE)) { install.packages(c("RgoogleMaps", "geosphere", "proto", "sampling", "seqinr", "spacetime", "spdep"), dependencies=TRUE) remotes::install_github("dyerlab/popgraph") } if(!requireNamespace("gstudio", quietly = TRUE)) remotes::install_github("dyerlab/gstudio")
library(LandGenCourse) library(GeNetIt) library(dplyr) library(EcoGenetics) #require(adegenet) #require(tibble) #require(gstudio) #require(hierfstat) #require(PopGenReport) #require(mmod) #require(spdep)
Import the site data that we used before in Week 2.
Frogs.coords <- read.csv(system.file("extdata", "ralu_coords_allpops.csv", package = "LandGenCourse"))
In Week 3, we calculated population-level genetic data as a table Frogs.diversity
. Here we import a table with the same diversity measures for all 29 sites from a system file in LandGenCourse
.
Frogs.diversity <- read.csv(system.file("extdata", "Frogs_diversity_allpops.csv", package = "LandGenCourse"))
Import the genetic data for 29 sites:
Frogs.loci <- read.csv(system.file("extdata", "ralu_loci_allpops.csv", package = "LandGenCourse"))
We start building an ecogen
object by assigning the genetic data to the G
slot, and the structure (i.e., hierarchical sampling information, here site names) to the S
slot. For the genetic data, we need to specify the type of data and coding.
Note the tweak using data.frame
when specifying data.frame(ralu.loci[,1:2])
. This is necessary to import the multiple columns correctly and with their original names.
Frogs.ecogen <- ecogen(G = Frogs.loci[,-c(1:2)], ploidy = 2, type = "codominant", sep = ":", S = data.frame(Frogs.loci[,1:2])) Frogs.ecogen
The summary confirms that we now have data in the S
and G
slots. In addition, EcoGenetics created a table of absolute frequencies (i.e. counts) of alleles for each individual in slot A
.
Most of our analysis for this lab will be at the population level. The function ecogen2ecopop
aggregates from individual to population-level data.
Frogs.ecopop <- ecogen2ecopop(Frogs.ecogen, hier = "SiteName") Frogs.ecopop
Instead of 413 individuals, we now have data for 30 populations. Slot AF
contains the population-level absolute frequencies (counts) of alleles.
Before importing the genetic diversity, spatial coordinates and site data into the ecopop
object, we need to match the rows and extract the data for the sampled populations.
Unfortunately, the datasets use different versions of the site name, both were include in the file ralu_loci_allpops.csv
and thus in the S
slot of Frogs.ecogen
.
group_by
and summarize
to get the unique set of site names in both versions. We save it as a data frame Subset.by
. R will confirm what ID variable it used to join. Subset <- ecoslot.S(Frogs.ecogen) %>% group_by(SiteName, Pop) %>% summarize() Subset <- left_join(Subset, Frogs.diversity)
R tells us that it used the shared ID variable Pop
to join the data.
Now we can join the site data. Note: with as.data.frame
, we combine the @data
and @coords
slots of the SpatialPointsDataFrame ralu.site
to a single data frame.
Subset <- left_join(Subset, as.data.frame(Frogs.coords))
This time, R used the shared ID variable SiteName
to join the data.
Now we have all site-level data in the data frame Subset
that has the same row names as Frogs.ecopop
. We check the names of the variables to decide which ones to put where. Then we assign them to their respective slots (@C
for the custom data, here genetic diversity, @E
for the environmental data, and @XY
for the coordinates).
The argument pop specifies the matching ID variable in the @S
slot. When we aggregated from Frogs.ecogen
to Frogs.ecopop
, unfortunately EcoGenetics renamed the variable SiteName
to pop
. The argument pop_levels
identifies the corresponding ID variable in Subset
.
names(Subset) Frogs.ecopop <- EcoGenetics::eco.fill_ecogen_with_df(Frogs.ecopop, pop ="pop", pop_levels = Subset$SiteName, C = Subset[,3:6], XY = Subset[,7:10]) Frogs.ecopop
'EcoGenetics' provides convenient functions for converting genetic data to and from other packages.
Import into 'genind' object (package 'adegenet'): there is a dedicated function, but we need to separately declare the variable that represents the populations and write it into the @pop
slot of the 'genind' object.
Frogs.genind <- EcoGenetics::ecogen2genind(Frogs.ecogen) Frogs.genind@pop <- ecoslot.S(Frogs.ecogen)$SiteName
For calculating population-level genetic distances, we aggregate the individual-level data to a genpop
object with population-level allele frequencies.
Frogs.genpop <- adegenet::genind2genpop(Frogs.genind) Frogs.genpop
Note: Alternatively, we could directly import the ecopop
object into a genpop
object (adegenet
) with EcoGenetics::ecopop2genpop(Frogs.ecopop)
. However, this can create warnings later on when calculating genetic distances.
The object 'Frogs.genpop' has 30 rows, each representing a population.
We will also use some functions from the package gstudio
, hence we import the individual-level genetic data into gstudio
. This should be easy with the function EcoGenetics::ecogen2gstudio
. However, there seems to be an issue. The following chunk contains code adapted from within the ecogen2gstudio
function, tweaked to work for our data. (You can view the original function with: findMethods(ecogen2gstudio)
.)
#Frogs.gstudio <- EcoGenetics::ecogen2gstudio(Frogs.ecogen, type="codominant") dat <- eco.convert(Frogs.ecogen@G, "matrix", sep.in = ":", sep.out = ":") dat <- as.data.frame(dat, stringsAsFactors = FALSE) for (i in 1:ncol(dat)) { class(dat[, i]) <- "locus" } dat[is.na(dat)] <- gstudio::locus(NA) colnames(dat) <- colnames(Frogs.ecogen@G) Frogs.gstudio <- data.frame(ecoslot.S(Frogs.ecogen), dat) tibble::as_tibble(Frogs.gstudio)
Here, we'll calculate a number of different measures of genetic distance, using functions from several packages. Adding the package name to each distance matrix name helps keeping track of methods used. Normally you would not calculate all of them for your own data, and some are redundant, as we will see.
Note: Some functions provide an option linearized = TRUE
to linearize distances d
by calculating d/(1-d)
. This should result in more linear relationships when plotted or correlated against geographic distance.Here we don't linearize, we can do so later manually.
Pairwise Fst with package hierfstat
:
GD.pop.PairwiseFst.hierfstat <- as.dist(hierfstat::pairwise.neifst(hierfstat::genind2hierfstat(Frogs.genind)))
Proportion of shared alleles with package 'PopGenReport':
GD.pop.propShared <- PopGenReport::pairwise.propShared(Frogs.genind)
Several distance matrices with package 'adegenet':
GD.pop.Nei <- adegenet::dist.genpop(Frogs.genpop, method=1) GD.pop.Edwards <- adegenet::dist.genpop(Frogs.genpop, method=2) GD.pop.Reynolds <- adegenet::dist.genpop(Frogs.genpop, method=3) GD.pop.Rogers <- adegenet::dist.genpop(Frogs.genpop, method=4) GD.pop.Provesti <- adegenet::dist.genpop(Frogs.genpop, method=5)
Additional distance matrices with package 'mmod':
GD.pop.Joost <- mmod::pairwise_D(Frogs.genind, linearized = FALSE) GD.pop.Hedrick <- mmod::pairwise_Gst_Hedrick(Frogs.genind, linearized = FALSE) GD.pop.NeiGst <- mmod::pairwise_Gst_Nei(Frogs.genind, linearized = FALSE)
GD.pop.Euclidean.gstudio <-gstudio::genetic_distance(Frogs.gstudio, mode = "Euclidean", stratum="SiteName") GD.pop.cGD.gstudio <-gstudio::genetic_distance(Frogs.gstudio, mode = "cGD", stratum="SiteName") GD.pop.Nei.gstudio <-gstudio::genetic_distance(Frogs.gstudio, mode = "Nei", stratum="SiteName") GD.pop.Dps.gstudio <-gstudio::genetic_distance(Frogs.gstudio, mode = "Dps", stratum="SiteName") GD.pop.Jaccard.gstudio <-gstudio::genetic_distance(Frogs.gstudio, mode = "Jaccard", stratum="SiteName")
We'll store the population-level genetic distances in a list 'GD.pop'.
Note: a few measures return similarities (scaled between 0 and 1) instead of distances. For instance, 'proporition of shared alleles' is 1 if the alleles are identical, and zero of no alleles are shared. We convert these values to distances by subtracting them from 1.
GD.pop <- list(pairwiseFst.hierfstat = GD.pop.PairwiseFst.hierfstat, propShared.PopGenReport = 1 - GD.pop.propShared, Nei.adegenet = GD.pop.Nei, Edwards.adegenet = GD.pop.Edwards, Reynolds.adegenet = GD.pop.Reynolds, Rogers.adegenet = GD.pop.Rogers, Provesti.adegenet = GD.pop.Provesti, Joost.mmod = GD.pop.Joost, Hedrick.mmod = GD.pop.Hedrick, Nei.mmod = GD.pop.NeiGst, Euclidean.gstudio = as.dist(GD.pop.Euclidean.gstudio), cGD.gstudio = as.dist(GD.pop.cGD.gstudio), Nei.gstudio = as.dist(GD.pop.Nei.gstudio), Dps.gstudio = as.dist(1 - GD.pop.Dps.gstudio), Jaccard.gstudio = as.dist(1 - GD.pop.Jaccard.gstudio)) round(cor(sapply(GD.pop, function(ls) as.vector(ls))),2)[,1:2]
Consider the correlations printed above (only the first two columns of the correlation matrix are shown).
Note: the following functions calculate distance matrices at the individual level:
Optional: You can use save
to save an R object to your file system, and load
to read it in again. Note: the default setting is that save
will overwrite existing files with the same name.
The code is commented out with #
. To run it, remove the #
. The first part creates a folder output
in your project folder if it does not yet exist. The function save
writes the list GD.pop
into a file "GD.pop.RData", and the function load
imports it again.
#require(here) #if(!dir.exists(paste0(here::here(),"/output"))) dir.create(paste0(here::here(),"/output")) #save(GD.pop, file = paste0(here::here(),"/output/GD.pop.RData")) #load(paste0(here::here(),"/output/GD.pop.RData"))
First, we calculate geographic (Euclidean) distances Dgeo
with the dist
function, using the spatial coordinates that we imported from ralu.site. These are UTM coordinates and thus metric.
Dgeo <- as.vector(dist(ecoslot.XY(Frogs.ecopop)[,1:2]))
Before we quantify the linear relationship between genetic and geographic distances, let's check visually whether the relationship is indeed linear. To start, we will define genetic distance Dgen
based on proportion of shared alleles.
par(mar=c(4,4,0,0)) Dgen <- as.vector(GD.pop$propShared.PopGenReport) dens <- MASS::kde2d(Dgeo, Dgen, n=300) myPal <- colorRampPalette(c("white","blue","gold","orange","red")) plot(Dgeo, Dgen, pch=20, cex=0.5, xlab="Geographic Distance", ylab="Genetic Distance") image(dens, col=transp(myPal(300), 0.7), add=TRUE) abline(lm(Dgen ~ Dgeo)) lines(loess.smooth(Dgeo, Dgen), col="red")
There is clearly an increase of genetic distance with geographic distance. However, the red line, which is a smoothed local mean, indicates that the relationship is not linear.
Let's take the natural logarithm of geographic distance:
par(mar=c(4,4,0,0)) dens <- MASS::kde2d(log(Dgeo), Dgen, n=300) plot(log(Dgeo), Dgen, pch=20, cex=0.5, xlab="Geographic Distance", ylab="Genetic Distance") image(dens, col=transp(myPal(300), 0.7), add=TRUE) abline(lm(Dgen ~ log(Dgeo))) lines(loess.smooth(log(Dgeo), Dgen), col="red")
Questions:
Next, we perform a Mantel test with the function mantel
from the vegan
package.
We define Dgen and Dgeo anew, as we need them in 'dist' format this time, not as vectors.
Dgen <- GD.pop$propShared.PopGenReport Dgeo <- dist(ecoslot.XY(Frogs.ecopop)[,1:2]) IBD <- vegan::mantel(Dgen,Dgeo, method="pearson") IBD
What happens if we take the log of geographic distance, which we saw above helps linearize the relationship for these data?
IBD <- vegan::mantel(Dgen,log(Dgeo), method="pearson") IBD
The statistical significance (p-value) didn't really change, but the strength of the Mantel correlation increased from r = 0.62 to r = 0.68.
Instead of transforming variables, we could use Spearman rank correlation to quantify the strength of the association. Rank correlations can be used to quantify and test the strength of curvilinear relationships.
IBD <- vegan::mantel(Dgen,Dgeo, method="spearman") IBD
For this measure of genetic diversity, the Mantel correlation was significant, quite strong, and the relationship not linear, hence the transformed data or the rank correlation performed better. What about the other measures?
Here we use lapply
to apply the function mantel
to each genetic distance matrix in GD.pop
. Then we use sapply
to extract two values for each distance matrix: statistic
is the Mantel r statistic (here: Pearson linear correlation), and signif
is the p-value. We can find these names with the function attributes
(see above).
attributes(IBD)
Mantel.test <- lapply(GD.pop, function(x) vegan::mantel(x,Dgeo, method="pearson")) data.frame(Mantel.r = sapply(Mantel.test, function(x) x$statistic), p.value = sapply(Mantel.test, function(x) x$signif))
The nature of the result did not depend on the measure of genetic diversity used. Let's repeat the analysis with log(Dgeo).
Mantel.test <- lapply(GD.pop, function(x) ade4::mantel.randtest(x,log(Dgeo))) data.frame(Mantel.r = sapply(Mantel.test, function(x) x$obs), p.value = sapply(Mantel.test, function(x) x$pvalue))
Questions: - Did the transformation lead to higher Mantel correlations for all measures of genetic distance used here? - Why was it higher?
Let's look at the relationship between genetic distance and geographic distance in a different way, with a Mantel correlogram. Note that this method does not make the assumption of linearity.
Here, we'll create a population-level Mantel correlogram with the proportion of shared alleles.
Note: The function eco.cormantel
has an option latlon=TRUE
that takes care of the distance calculation from lat-lon coordinates. To use this option, the coordinates must be in a matrix or data frame with the longitude in the first column and the latitude in the second. Here, we can set we can set latlon=FALSE
because the spatial coordinates are in UTM projection.
The biological hypothesis of isolation-by-distance postulates that genetic distance increases with geographic distance. Hence it makes sense to use a one-sided alternative. Somewhat counter-intutitively, we use 'alternative="less"' to test for positive spatial autocorrelation.
corm <- EcoGenetics::eco.cormantel(M = GD.pop$propShared.PopGenReport, XY = ecoslot.XY(Frogs.ecopop)[,1:2], nsim = 199, latlon=FALSE, alternative="less", method = "pearson") corm
The table shows:
The result corm
is an object of class eco.correlog
(package: EcoGenetics). A safe way to access thet table is ecoslot.OUT(corm)
.
Let's plot the correlogram:
EcoGenetics::eco.plotCorrelog(corm)
You can hover over individual points of the correlogram to see the test statistic and mean distance.
Under IBD, at least the first distance lag should show positive spatial autocorrelation. Here, it is the first two classes, as indicated by the red symbols.
To what degree does this result depend on the following:
There are several options of the eco.cormantel
function to modify the definition of distance classes:
int
: distance interval in the units of XYsmin
: minimum class distance in the units of XYsmax
: maximum class distance in the units of XYnclass
: number of classesseqvec
: vector with breaks in the unites of XYsize
: number of individuals per classbin
: rule for constructing intervals if no other parameters provided (default: Sturge's rule)The easiest ones to modify are either nclass
or size
. Here we use size
to specify that there should be at least 50 or 100 pairs in each distance class. (Note: for a reliable analysis, this should be 100 or more).
corm.50 <- EcoGenetics::eco.cormantel(M = GD.pop$propShared.PopGenReport, XY = ecoslot.XY(Frogs.ecopop)[,1:2], nsim = 199, latlon=FALSE, alternative="less", size=50) EcoGenetics::ecoslot.OUT(corm.50) corm.100 <- EcoGenetics::eco.cormantel(M = GD.pop$propShared.PopGenReport, XY = ecoslot.XY(Frogs.ecopop)[,1:2], nsim = 199, latlon=FALSE, alternative="less", size=100) EcoGenetics::ecoslot.OUT(corm.100) EcoGenetics::eco.plotCorrelog(corm.50) EcoGenetics::eco.plotCorrelog(corm.100)
Let's compare the observed Mantel r statistic, p-value, number of pairs in the first distance class and their mean distance, as well as the definition of the first lag interval. We can get all of this by extracting the first line from each object.
The lag intervals are stored only in the row names, and we need to extract those separately and add them as a colum.
Lag1.def <- data.frame(rbind(Sturge = EcoGenetics::ecoslot.OUT(corm)[[1]][1,], size.50 = EcoGenetics::ecoslot.OUT(corm.50)[[1]][1,], size.100 = EcoGenetics::ecoslot.OUT(corm.100)[[1]][1,])) Lag1.def$bin <- c(row.names(EcoGenetics::ecoslot.OUT(corm)[[1]])[1], row.names(EcoGenetics::ecoslot.OUT(corm.50)[[1]])[1], row.names(EcoGenetics::ecoslot.OUT(corm.100)[[1]])[1]) Lag1.def
Overall, Sturge's rule to define distance classes seems to be a good compromise. What is the trade-off, i.e., what happens if distance lags are defined too narrowly or too widely?
It can be really helpful to plot the distribution of distances among the pairs and compare it to the distance intervals:
par(mfrow=c(3,1)) hist(Dgeo, nclass=50, main="Sturge's rule", axes=F, xlab="", ylab="") for(i in 1:length(EcoGenetics::ecoslot.BREAKS(corm))){ lines(rep(EcoGenetics::ecoslot.BREAKS(corm)[i], 2), c(0,50), col="blue")} hist(Dgeo, nclass=50, main = "50 pairs per lag", axes=F) for(i in 1:length(EcoGenetics::ecoslot.BREAKS(corm.50))){ lines(rep(EcoGenetics::ecoslot.BREAKS(corm.50)[i], 2), c(0,50), col="blue")} hist(Dgeo, nclass=50, main = "100 pairs per lag", axes=F) for(i in 1:length(EcoGenetics::ecoslot.BREAKS(corm.100))){ lines(rep(EcoGenetics::ecoslot.BREAKS(corm.100)[i], 2), c(0,50), col="blue")}
Question: Compare what happens at larger distances. Do you think Sturge's rule does a good job for these as well?
Unlike a Mantel test, where all pairs are considered, in geostatistics we typically interpret only values for distances up to a certain threshold, e.g. half the maximum distance, for two reasons:
Which measure of genetic distance would provide the strongest Mantel correlation in the first distance class for this data set?
Here we will cycle through all genetic distance matrices in GD.pop
and calculate a Mantel correlogram with Sturge's rule (not linearized, method="pearson"). This may take a while.
Note: the code that calculates corm.GD.pop
is included here twice, first commented out, then with the option include=FALSE
. This avoids printing out a lot of unnecessary output while still showing the (commented out) code in the .html version of the file.
#corm.GD.pop <- lapply(GD.pop, function(x) EcoGenetics::eco.cormantel(M = x, # XY = ecoslot.XY(Frogs.ecopop)[,1:2], nsim = 199, latlon=FALSE, # alternative="less"))
corm.GD.pop <- lapply(GD.pop, function(x) EcoGenetics::eco.cormantel(M = x, XY = ecoslot.XY(Frogs.ecopop)[,1:2], nsim = 199, latlon=FALSE, alternative="less"))
Next, we extract for each genetic distance matrix the observed value of the Mantel correlation for the first distance class and its p-value.
t(sapply(corm.GD.pop, function(x) EcoGenetics::ecoslot.OUT(x)[[1]][1,c(2,4)]))
Compare the p-values: for this dataset, all genetic distance measures resulted in significant spatial autocorrelation (indicating IBD).
Let's plot the Mantel correlogram for Nei.adegenet
. Statistically significant lags are shown in a different color than non-significant ones.
EcoGenetics::eco.plotCorrelog(corm.GD.pop$Nei.adegenet)
Questions:
In this part, we'll quantify and test Moran's I for the genetic diversity data as calculated in Week 3 lab.
Note: Above, we used a distance lag approach from geostatistics, here we use spatial neighbours and weights (neighbor topology). Either approach could be used with either type of data.
For a detailed tutorial on defining spatial neighbors and weights, see: https://cran.r-project.org/web/packages/adespatial/vignettes/tutorial.html#irregular-samplings
The function chooseCN
(package: adegenet
) provides an interface for choosing a connection network, i.e., for defining spatial neighbors. The underlying functions are defined in package spdep
(for defining spatial dependences). It can return the following graph types:
Here we use types 1 - 6 to define neighbors in different ways. Then we plot each graph in geographic space. Lines indicate pairs of sites classified as neighbors.
Note: this function expects metric spatial coordinates (e.g., UTM).
nb.del <- adegenet::chooseCN(xy = ecoslot.XY(Frogs.ecopop)[,1:2], result.type = "nb", plot.nb = FALSE, type = 1) nb.gab <- adegenet::chooseCN(xy = ecoslot.XY(Frogs.ecopop)[,1:2], result.type = "nb", plot.nb = FALSE, type = 2) nb.rel <- adegenet::chooseCN(xy = ecoslot.XY(Frogs.ecopop)[,1:2], result.type = "nb", plot.nb = FALSE, type = 3) nb.mst <- adegenet::chooseCN(xy = ecoslot.XY(Frogs.ecopop)[,1:2], result.type = "nb", plot.nb = FALSE, type = 4) nb.nbd <- adegenet::chooseCN(xy = ecoslot.XY(Frogs.ecopop)[,1:2], result.type = "nb", plot.nb = FALSE, type = 5, d1=100, d2=15000) nb.4nn <- adegenet::chooseCN(xy = ecoslot.XY(Frogs.ecopop)[,1:2], result.type = "nb", plot.nb = FALSE, type = 6, k = 4) par(mfrow=c(2,3), mai=c(0.1,0.1,0.1, 0.1)) plot(nb.del, coords=ecoslot.XY(Frogs.ecopop)); title(main="Delaunay") plot(nb.gab, coords=ecoslot.XY(Frogs.ecopop)); title(main="Gabriel") plot(nb.rel, coords=ecoslot.XY(Frogs.ecopop)); title(main= "Rel. neighbors") plot(nb.mst, coords=ecoslot.XY(Frogs.ecopop)); title(main= "Min spanning tree") plot(nb.nbd, coords=ecoslot.XY(Frogs.ecopop)); title(main = "Neighbor distance") plot(nb.4nn, coords=ecoslot.XY(Frogs.ecopop)); title(main = "4 nearest neighbors") par(mfrow=c(1,1))
For spatial statistics, spatial neighbors are used to calculate a local mean. We want each site to have multiple neighbors, but they should be nearby only. Gabriel graph (type = 2) is often a good option, and we'll use it for the rest of this worked example.
By default, chooseCN
returns row-standardized weights, so that for each site, the weights of its neighbors sum to 1. This means that a local mean can be calculated as a weighted mean of the other sites (non-neighboring sites have a weight of 0).
With the function nb2mat
we can convert the neighbor object to a matrix of spatial weights. Let's look at the first five lines and columns:
spdep::nb2mat(nb.gab)[1:5,1:5]
Each row contains the weights for the neighbors of one site. We see that the third site is a neighbor of the second site and vice versa. However, the weights are not the same. It seems that site 2 has six neighbors, so each has a weight of 0.167, whereas site 3 has four neighbors, each with a weight of 0.25.
Let's see whether rarefied allelic richness, Ar, shows spatial autocorrelation among these sites.
spdep::moran.test(ecoslot.C(Frogs.ecopop)$Ar, spdep::nb2listw(nb.gab), alternative="greater")
The test statistic is 0.61, and the p-value for a one-sided alternative "greater" (i.e., positive spatial autocorrelation) is p < 0.0001. Thus, Ar showed strong and statistially significant spatial autocorrelation.
Let's do this for all genetic diversity variables and extract the value of the Moran I statistics (for first neighbors) and its p-value.
Frogs.moran <- lapply(ecoslot.C(Frogs.ecopop), function(x) spdep::moran.test(x, spdep::nb2listw(nb.gab), alternative="two.sided")) round(data.frame(obs = sapply(Frogs.moran, function(x) as.vector(x$estimate[1])), p.value = sapply(Frogs.moran, function(x) x$p.value)),3)
Questions:
Task: Assess fine-scale spatial genetic structure in Pulsatilla vulgaris within a single patch, A25: Test for IBD with Mantel rank correlation, and use a Mantel correlogram to assess the range of spatial autocorrelation.
Hints:
a) Load packages: You may want to load the packages dplyr
, EcoGenetics
and adegenet
. Alternatively, you can use ::
to call functions from packages.
b) Import data, extract adults from A25.
- Use the code below to import the data into gstudio. - Inspect a few rows of the data. - Filter the dataset to retain adults (OffID == 0) from site A25 (Population == "A25").
Pulsatilla.gstudio <- gstudio::read_population(path=system.file("extdata", "pulsatilla_genotypes.csv", package = "LandGenCourse"), type="column", locus.columns=c(6:19), phased=FALSE, sep=",", header=TRUE)
c) Plot locations of individuals: use plot
with the argument asp = 1
to plot the locations of the sampled individuals from site A25 to scale.
d) Convert to ecogen and genind: use EcoGenetics::gstudio2ecogen
to convert to an ecogen
object. From there, use EcoGenetics::ecogen2genind
to convert to a genind
object.
e) Calculate genetic and Euclidean distance:
- Use `adegenet::propShared` to calculate the proportion of shared alleles at the individual level. (Do not convert to genpop). - Convert to a distance measure by calculating `1 - propShared`. - Check object class. If it is not `dist`, use `as.dist` to convert to a `dist` object. - Use `dist` to calculate Euclidean distance between individuals.
f) Mantel test:
- Adapt code from section 4.a to plot pairwise genetic distance against Euclidean distance. - Do you notice something unusial in the plot? Why are there so few different values of genetic distance? - Do you think there is spatial autocorrelation? If so, up to what distance? - Adapt code from section 4.b to test for IBD with a Mantel test, using Spearman rank correlation.
g) Mantel correlogram: adapt code from section 5.a to create and plot a Mantel correlogram. Interpret the results.
Questions: What is the range of spatial autocorrelation in P. vulgaris in site A25?
LandGenCourse::detachAllPackages()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.