knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
We will compare the performance of PopGenHelpR
with other R packages available on CRAN. Below, we list the packages we will compare PopGenHelpR
with and the statistics in each comparison.
F~st~ and Nei's D
Jost's D
Expected and Observed Heterozygosity
First we will load the packages and the data in PopGenHelpR
. The data comes from Farleigh et al. (2021). We also load vcfR
to convert between data formats (Knaus & Grunwald, 2017).
# Load the packages library(PopGenHelpR) library(adegenet) library(hierfstat) library(StAMPP) library(mmod) library(vcfR) # Load the data data("HornedLizard_VCF") data("HornedLizard_Pop")
We will compare PopGenHelpR
and StAMPP
. Both packages use the formulas from Weir and Cockerham (1984) ane Nei (1972) to calculate F~ST~ and Nei's D, respectively but we want to make sure that our estimates are consistent across packages.
First, we need to format the data for StAMPP
# StAMPP uses genlight objects Glight <- vcfR2genlight(HornedLizard_VCF) # Set the population information and ploidy Glight@pop <- as.factor(HornedLizard_Pop$Population) ploidy(Glight) <- 2
Now we can calculate our statistics. Let's start with F~ST~.
PGH_fst <- Differentiation(dat = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "Fst") Stmp_fst <- stamppFst(Glight, nboots = 0)
Let's inspect the results.
PGH_fst$Fst Stmp_fst # Is there a difference between the two? Fst_comparison <- PGH_fst$Fst-Stmp_fst summary(Fst_comparison)
Now we move onto Nei's D. We can use the same genlight that we created for the F~ST~ calculations. We will calculate Nei's D for both population's and individual's.
PGH_ND <- Differentiation(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "NeisD") # StAMPP population Nei's D Stmp_popND <- stamppNeisD(Glight) # StAMPP individual Nei's D Stmp_indND <- stamppNeisD(Glight, pop = FALSE)
Compare the results like we did with F~ST~. Note that PopGenHelpR
only reports result on the lower triangular element so we will set the upper triangular element of the Stmp_popND
and Stmp_indND
objects to NA.
# Population comparison PGH_ND$NeisD_pop Stmp_popND # Set StAMPP upper diagnoals to NA Stmp_popND[upper.tri(Stmp_popND)] <- NA Stmp_indND[upper.tri(Stmp_indND)] <- NA popND_comparison <- PGH_ND$NeisD_pop-Stmp_popND summary(popND_comparison) # Get the mean difference mean(popND_comparison, na.rm = T) # Individual comparison, uncomment if you want to see it #PGH_ND$NeisD_ind #Stmp_indND indND_comparison <- PGH_ND$NeisD_ind - Stmp_indND mean(indND_comparison, na.rm = T)
We see that the difference is very small and is because of rounding. Let's move onto Jost's D comparison with mmod
.
We will compare PopGenHelpR
and mmod
. Both packages use the formulas from Jost (2008). mmod
uses genind objects so we have to do some format conversion first.
Genind <- vcfR2genind(HornedLizard_VCF) Genind@pop <- as.factor(HornedLizard_Pop$Population) ploidy(Genind) <- 2 # Calculate Jost's D PGH_JD <- Differentiation(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "JostsD") mmod_JD <- pairwise_D(Genind) PGH_JD$JostsD mmod_JD # Compare differences mathematically PGH_JD$JostsD[2:3,1] - mmod_JD[1:2] PGH_JD$JostsD[2,2] - mmod_JD[3]
Estimate's are very similar between PopGenHelpR
and mmod
, we will move onto heterozygosity.
We will compare PopGenHelpR
, hierfstat
, and adegenet
. Again, the packages use the same formula's, so we expect similar if not identical results. hierfstat
uses it's own format, so we will convert the data before calculations. Luckily we can convert the genind object from Jost's D comparisons.
Hstat <- genind2hierfstat(Genind) ### Calculate heterozygosities # Expected PGH_He <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "He") # Observed PGH_Ho <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "Ho") Hstat_hets <- basic.stats(Hstat) Hstat_Ho <- colMeans(Hstat_hets$Ho) He_adnet <- Hs(Genind) PGH_He$He_perpop$Expected.Heterozygosity-He_adnet PGH_Ho$Ho_perpop$Observed.Heterozygosity-Hstat_Ho
We again see very small differences between the estimates.
Please reach out to Keaka Farleigh (farleik@miamioh.edu) if you have any questions, and please see the references and acknowledgments below.
Farleigh, K., Vladimirova, S. A., Blair, C., Bracken, J. T., Koochekian, N., Schield, D. R., ... & Jezkova, T. (2021). The effects of climate and demographic history in shaping genomic variation across populations of the Desert Horned Lizard (Phrynosoma platyrhinos). Molecular Ecology, 30(18), 4481-4496.
Goudet, J. (2005). hierfstat, a package for R to compute and test hierarchical F‐statistics. Molecular ecology notes, 5(1), 184-186.
Jost, L. (2008). GST and its relatives do not measure differentiation. Molecular ecology, 17(18), 4015-4026.
Knaus, B. J., & Grünwald, N. J. (2017). vcfr: a package to manipulate and visualize variant call format data in R. Molecular ecology resources, 17(1), 44-53.
Nei, M. (1972). Genetic distance between populations. The American Naturalist, 106(949), 283-292.
Pembleton, L. W., Cogan, N. O., & Forster, J. W. (2013). St AMPP: An R package for calculation of genetic differentiation and structure of mixed‐ploidy level populations. Molecular ecology resources, 13(5), 946-952.
Weir, B. S., & Cockerham, C. C. (1984). Estimating F-statistics for the analysis of population structure. evolution, 1358-1370.
Winter, D., Green, P., Kamvar, Z., & Gosselin, T. (2017). mmod: modern measures of population differentiation (Version 1.3.3).
We thank the authors of hierfstat
, mmod
, StAMPP
, and all of the package dependencies. They provided inspiration for PopGenHelpR
and their commitment to open science made it possible to develop and benchmark our package.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.