Benchmarking PopGenHelpR with adegenet, hierfstat, mmod, and StAMPP
In PopGenHelpR: Streamline Population Genomic and Genetic Analyses

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Purpose

We will compare the performance of PopGenHelpR with other R packages available on CRAN. Below, we list the packages we will compare PopGenHelpR with and the statistics in each comparison.

F~st~ and Nei's D

StAMPP (Pembleton et al., 2013)

Jost's D

mmod (Winter et al., 2017)

Expected and Observed Heterozygosity

hierfstat (Goudet, 2005)
adegenet (Jombart, 2008)

Let's Begin

First we will load the packages and the data in PopGenHelpR. The data comes from Farleigh et al. (2021). We also load vcfR to convert between data formats (Knaus & Grunwald, 2017).

# Load the packages
library(PopGenHelpR)
library(adegenet)
library(hierfstat)
library(StAMPP)
library(mmod)
library(vcfR)

# Load the data 
data("HornedLizard_VCF")
data("HornedLizard_Pop")

F~ST~ and Nei's D Comparison

We will compare PopGenHelpR and StAMPP. Both packages use the formulas from Weir and Cockerham (1984) ane Nei (1972) to calculate F~ST~ and Nei's D, respectively but we want to make sure that our estimates are consistent across packages.

First, we need to format the data for StAMPP

# StAMPP uses genlight objects
Glight <- vcfR2genlight(HornedLizard_VCF)

# Set the population information and ploidy
Glight@pop <- as.factor(HornedLizard_Pop$Population)
ploidy(Glight) <- 2

Now we can calculate our statistics. Let's start with F~ST~.

PGH_fst <- Differentiation(dat = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "Fst")

Stmp_fst <- stamppFst(Glight, nboots = 0)

Let's inspect the results.

PGH_fst$Fst
Stmp_fst

# Is there a difference between the two?
Fst_comparison <- PGH_fst$Fst-Stmp_fst

summary(Fst_comparison)

Now we move onto Nei's D. We can use the same genlight that we created for the F~ST~ calculations. We will calculate Nei's D for both population's and individual's.

PGH_ND <- Differentiation(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "NeisD")

# StAMPP population Nei's D
Stmp_popND <- stamppNeisD(Glight)

# StAMPP individual Nei's D
Stmp_indND <- stamppNeisD(Glight, pop = FALSE)

Compare the results like we did with F~ST~. Note that PopGenHelpR only reports result on the lower triangular element so we will set the upper triangular element of the Stmp_popND and Stmp_indND objects to NA.

# Population comparison
PGH_ND$NeisD_pop
Stmp_popND

# Set StAMPP upper diagnoals to NA
Stmp_popND[upper.tri(Stmp_popND)] <- NA
Stmp_indND[upper.tri(Stmp_indND)] <- NA

popND_comparison <- PGH_ND$NeisD_pop-Stmp_popND
summary(popND_comparison)

# Get the mean difference
mean(popND_comparison, na.rm = T)

# Individual comparison, uncomment if you want to see it
#PGH_ND$NeisD_ind
#Stmp_indND

indND_comparison <- PGH_ND$NeisD_ind - Stmp_indND
mean(indND_comparison, na.rm = T)

We see that the difference is very small and is because of rounding. Let's move onto Jost's D comparison with mmod.

Jost's D Comparison

We will compare PopGenHelpR and mmod. Both packages use the formulas from Jost (2008). mmod uses genind objects so we have to do some format conversion first.

Genind <- vcfR2genind(HornedLizard_VCF)
Genind@pop <- as.factor(HornedLizard_Pop$Population)
ploidy(Genind) <- 2

# Calculate Jost's D
PGH_JD <- Differentiation(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "JostsD")
mmod_JD <- pairwise_D(Genind)

PGH_JD$JostsD
mmod_JD

# Compare differences mathematically
PGH_JD$JostsD[2:3,1] - mmod_JD[1:2]
PGH_JD$JostsD[2,2] - mmod_JD[3]

Estimate's are very similar between PopGenHelpR and mmod, we will move onto heterozygosity.

Expected and Observed Heterozygosity Comparison

We will compare PopGenHelpR, hierfstat, and adegenet. Again, the packages use the same formula's, so we expect similar if not identical results. hierfstat uses it's own format, so we will convert the data before calculations. Luckily we can convert the genind object from Jost's D comparisons.

Hstat <- genind2hierfstat(Genind)

### Calculate heterozygosities
# Expected
PGH_He <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "He")
# Observed
PGH_Ho <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "Ho")

Hstat_hets <- basic.stats(Hstat)
Hstat_Ho <- colMeans(Hstat_hets$Ho)

He_adnet <- Hs(Genind)

PGH_He$He_perpop$Expected.Heterozygosity-He_adnet
PGH_Ho$Ho_perpop$Observed.Heterozygosity-Hstat_Ho

We again see very small differences between the estimates.

Please reach out to Keaka Farleigh (farleik@miamioh.edu) if you have any questions, and please see the references and acknowledgments below.

References

Farleigh, K., Vladimirova, S. A., Blair, C., Bracken, J. T., Koochekian, N., Schield, D. R., ... & Jezkova, T. (2021). The effects of climate and demographic history in shaping genomic variation across populations of the Desert Horned Lizard (Phrynosoma platyrhinos). Molecular Ecology, 30(18), 4481-4496.

Goudet, J. (2005). hierfstat, a package for R to compute and test hierarchical F‐statistics. Molecular ecology notes, 5(1), 184-186.

Jost, L. (2008). GST and its relatives do not measure differentiation. Molecular ecology, 17(18), 4015-4026.

Knaus, B. J., & Grünwald, N. J. (2017). vcfr: a package to manipulate and visualize variant call format data in R. Molecular ecology resources, 17(1), 44-53.

Nei, M. (1972). Genetic distance between populations. The American Naturalist, 106(949), 283-292.

Pembleton, L. W., Cogan, N. O., & Forster, J. W. (2013). St AMPP: An R package for calculation of genetic differentiation and structure of mixed‐ploidy level populations. Molecular ecology resources, 13(5), 946-952.

Weir, B. S., & Cockerham, C. C. (1984). Estimating F-statistics for the analysis of population structure. evolution, 1358-1370.

Winter, D., Green, P., Kamvar, Z., & Gosselin, T. (2017). mmod: modern measures of population differentiation (Version 1.3.3).

Acknowledgements

We thank the authors of hierfstat, mmod, StAMPP, and all of the package dependencies. They provided inspiration for PopGenHelpR and their commitment to open science made it possible to develop and benchmark our package.