knitr::opts_chunk$set( echo = TRUE, fig.align = "center", fig.height = 7, fig.width = 7, collapse = TRUE, comment = "#>" )
A major issue in population genetics is the accurate detection of the impact of natural selection along the genome. In this regard, several tests have been developed in the last years, being the McDonald and Kreitman Test (Standard MKT, McDonald & Kreitman 1991) the most used method. Several modifications to the Standard MKT have been applied over the last years, leading to a battery of MKT-derived methods.
The Integrative McDonald and Kreitman Test (iMKT) package allows computing the McDonald and Kreitman test on polymorphism and divergence genomic data provided by the user or automatically downloaded from PopFly (Hervas et al. 2017) or PopHuman (Casillas et al. 2018).
It includes five MKT derived tests: Standard MKT, FWW correction (Fay et al. 2001), DGRP correction (Mackay et al. 2012), asymptotic MK (Messer & Petrov 2013), and the novel iMKT approach; which allow inferring the rate of adaptive evolution ($\alpha$), as well as the fraction of strongly deleterious ($d$), weakly deleterious ($b$), and neutral ($f$) sites, using user custom input data.
In this document, we show how to install the package and how to analyze the sample data using different MKT-derived methods. Finally, we briefly discuss the results obtained with each of the available tests. This vignette contains three different sections: \begin{itemize} \item Loading the package and checking test data \item Performing MKT analyses \item Conclusion: comparison of the different MKT estimates \end{itemize}
First of all, install (if this is not done yet) and load the package. Notice that iMKT package includes two sample dataframes named myDafData and myDivergenceData which are the ones used in this tutorial and correspond to the chromosome arm 2R of the North American Raleigh population (RAL, North Carolina, $n=205$), using D. simulans as outgroup species to estimate divergence metrics. Therefore, it is possible to replicate the vignettes in order to better understand all the package functionalities.
## Load package # install.packages("devtools") # devtools::install_github("sergihervas/iMKT") library(iMKT) ## Sample daf data head(myDafData) ## Sample divergence data myDivergenceData
The iMKT package includes several functions, classified as follows:
\begin{itemize}
\item
Calculation of MKT-derived methods
\begin{itemize}
\item
standardMKT(): Standard MKT
\item
FWW(): FWW correction
\item
DGRP(): DGRP correction
\item
asymptoticMKT(): Asymptotic MKT
\item
iMKT(): integrative MKT
\item
completeMKT(): perform all previous tests
\end{itemize}
\item iMKT using PopFly and PopHuman data \begin{itemize} \item loadPopFly(): load PopFlyData \item loadPopHuman(): load PopHumanData \item PopFlyAnalisys(): perform any test using PopFlyData \item PopHumanAnalysis(): perform any test using PopHumanData \end{itemize}
\item Miscelanious \begin{itemize} \item checkInput(): check data before performing analyses \item themePublication(): output plots and tables styling \end{itemize} \end{itemize}
Each function has an associated help page with its description, details about its parameters, usage, examples and so on. Rembember you can access it writting ? and the function name (or ?library::function) in your console (example: ?iMKT::standardMKT).
This vignette focuses on the first category of functions: "Calculation of MKT-derived methods". Specifically, it contains examples of each function using the sample data described above. For details about functions from "iMKT using PopFly and PopHuman data" category check the corresponding vignette. The functions from the third category are used within other functions and do not produce analyses output.
The diverse functions from this category have two common input parameters which are required to perform the corresponding test:
\begin{itemize} \item daf: data frame containing DAF, Pi and P0 values (myDafData) \item divergence: data frame containing divergent and analyzed sites for selected (i) and neutral (0) classes (myDivergenceData) \end{itemize}
The output of each function always contains the corresponding alpha estimate, together with specific values and details of the selected methodology.
Brief theoretical description about MKT
The standardMKT() function uses daf and divergence input parameters and returns as output a list containing:
\begin{itemize} \item alpha symbol: estimate of alpha using the standard MKT \item Fisher exact test P-value: p-value obtained using the Fisher exact test on a 2x2 contingency table (MKT table) \item MKT table: table containing the number of polymorphic and divergent sites for neutral and selected classes. \item Divergence metrics: table containing estimates of Ka, Ks, omega, omegaA, omegaD. \end{itemize}
standardMKT(daf=myDafData, divergence=myDivergenceData)
Alpha estimates can be biased by the segregation of slighlty deleterious substitutions. One method to partially controll this effect is to remove low frequency polymorphisms from the analysis, as proposed by Fay et al. (2001). Using this correction, all polymorphic sites from both neutral and selected classes which have a derived allele frequency lower than the pre-defined cutoff are removed for the calculation of alpha.
The FWW() function uses daf and divergence input parameters, along with a default list of cutoffs (0, 0.05, 0.1) and returns as output a list containing:
\begin{itemize} \item Results: alpha estimates (and their associated Fisher exact test P-value) for each cutoff. \item Divergence metrics: global metrics (Ka, Ks, omega) and estimates by cutoff (omegaA, omegaD) \item MKT tables: tables containing the number of polymorphic and divergent sites for neutral and selected classes for each cutoff. \end{itemize}
FWW(daf=myDafData, divergence=myDivergenceData)
By default the argument listCutoffs uses a list of cutoffs with the following values: 0, 0.05, 0.1. Moreover, the function has an optional argument, plot, which is set to FALSE by default. This parameters can be customized, like in the following example, where we use a list of 4 cutoffs (0.05, 0.15, 0.25, 0.35) and set the plot argument to TRUE.
The output in this case contains a Graph which shows the adaptation value (alpha) obtained using each cutoff.
FWW(daf=myDafData, divergence=myDivergenceData, listCutoffs=c(0.05, 0.15,0.25,0.35), plot=TRUE)
To take adaptive and slightly deleterious mutation mutually into account, Pn , the count of segregating sites in the non-synonymous class, should be separated into the number of neutral variants and the number of weakly deleterious variants, Pn = Pn(neutral) + Pn(weakly del.). If both numbers are estimated, adaptive and weakly deleterious selection can be evaluated independently.
Consider a pair of 2×2 contingency tables. The first one corresponds to the standard MKT table with the theoretical counts of segregating sites and divergent sites for each cell.
The second table contains the count of Pn and Ps for two-frequency categories: below and over a threshold cutoff.
Add brief explanation about 2nd table!
To estimate alpha from the standard MKT table correcting by the segregation of weakly deleterious variants, we have to substitute the Pn by the expected number of neutral segregating sites, Pn(neutral). The correct estimate of alpha is then alpha = 1 - (Pn (neutral)/Ps)(Ds/Dn).
The DGRP() function behaves similar to the FWW() function. It takes the same input argument and returns the same output but containing also estimates on the fractions of negative selection (d: strongly deleterious, f: neutral and b: weakly deleterious).
DGRP(daf=myDafData, divergence=myDivergenceData)
Again, by default the argument listCutoffs uses a list of cutoffs with the following values: 0, 0.05, 0.1, and the argument plot is set to FALSE. This parameters can be customized, like in the following example, where we use a list of 4 cutoffs (0.05, 0.15, 0.25, 0.35) and set the plot argument to TRUE.
The output in this case contains two Graphs which show the adaptation value (alpha) and the negative selection fractions obtained using each cutoff.
DGRP(daf=myDafData, divergence=myDivergenceData, listCutoffs=c(0.05, 0.15,0.25,0.35), plot=TRUE)
Petrov reference + explanation
This function is adapted from the code developed in "Haller BC, Messer PW. asymptoticMK: A Web-Based Tool for the Asymptotic McDonald-Kreitman Test. G3 (Bethesda). 2017 May 5;7(5):1569-1575", stored in: http://github.com/MesserLab/asymptoticMK. The main adaptation we did is that the function presented here only fits an exponential model, removing the linear fitting performed initially in the cases where it was not possible to fit an asymptotic curve.
The asymptoticMKT() function uses the common daf and divergence parameters along with two arguments which define the lower and higher limit for the asymptotic alpha fit (xlow and xhigh). These two optional parameters are set to 0 and 1 by default, although it is recommended to use a higher limit of 0.9 in order to remove possible biases due to polarization error.
The function's output is a table with: the model type (exponential) along with the fitted function values (a, b, c), the asymptotic alpha estimate with its corresponding lower and higher confidence interval values, and the original alpha estimate (using the standard MKT methodology and the polymorphic sites within the xlow and xhigh cutoffs).
asymptoticMKT(daf=myDafData, divergence=myDivergenceData, xlow=0, xhigh=0.9)
The integrative MKT combines the approach developed by Messer & Petrov (2013) to estimate the fraction of adaptive substitutions ($\alpha$) and an adaptation of the theoretical framework established by Mackay et al. (2012) to quantify the fraction of putatively selected sites that are under purifying selection pressures.
The iMKT() function takes the default input parameters (daf and divergence), the xlow and xhigh arguments (presented in the asymptoticMKT() function) and it also has the optional argument plot, set as FALSE by default. However, in this example we use plot=TRUE to display the graphical results.
The output of the function contains: \begin{itemize} \item Asymptotic MK table: table corresponding to the asymptoticMKT() function output. \item Fractions of sites: negative selection fractions (d: strongly deleterious, f: neutral and b: weakly deleterious). \item Graphs: 3 plots showing: (A) the distribution of alleles frequencies for neutral and selected sites, (B) the asymptotic alpha estimate with xlow, xhigh, original alpha and asymptotic alpha marks, and (C) the negative selection fractions. \end{itemize}
iMKT(daf=myDafData, divergence=myDivergenceData, xlow=0, xhigh=0.9, plot=TRUE)
The following table includes the estimates of the rate of adaptive evolution ($\alpha$) obtained using the same data and each of the MKT methods included in the iMKT package. Results for FWW and DGRP corrections were produced using two different cut-offs (0.05 and 0.1).
results <- data.frame("Standard"=0.2365, "FWW_0.05"=0.5409, "FWW_0.1"=0.5798, "DGRP_0.05"=0.4249, "DGRP_0.1"=0.4126, "asymptotic_iMKT"=0.6259) knitr::kable(results, align="c") rm(results)
We observe that the Standard MKT tends to underestimate the true rate of adaptation, probably due to the presence of weakly deleterious mutations which reduce the power of the test to detect adaptive evolution.
Besides, we observe an increase of α when using any of the other methods Comment results and strengths and weakness of each method. Specifically, FWW correction seems to perform better than DGRP. Finally, the asymptotic method allows to estimate the largest rate of adaptive evolution. As we are analyzing a complete chromosome arm with a large number of segregating sites (in all DAF categories), the asymptotic test is able to remove almost all weakly deleterious mutations from the analysis and thus, we obtain the α estimates closest to the true level of adaptation.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.