In guilherme-pereira/QTLpoly: Random-effect multiple QTL mapping in autopolyploids

knitr::opts_chunk$set(eval = TRUE, cache = TRUE, echo = TRUE, collapse = TRUE, tidy.opts = list(width.cutoff=75), tidy = TRUE, fig.width = 6, fig.height = 5, out.width = '70%', out.height = '70%', dpi = 150, fig.align = 'center', fig.ncol = 1, prompt = FALSE)

Short code for demo

Please, copy-and-paste the code below to an R script file in RStudio: File > New File > R Script. See [installing the qtlpoly package] section if you have not installed it yet.

## Load package and data
library(qtlpoly)
data("genoprob4x"); length(genoprob4x)
data("pheno4x"); head(pheno4x)

## Read data to an object
data <- read_data(ploidy = 4, geno.prob = genoprob4x, pheno = pheno4x, step = 1)
print(data, detailed = TRUE)

## Detect QTL. One needs to run the score-based resampling method before proceding. Change pheno.col = c(1:3) and increase n.clusters if you believe your computer can handle it
sig.fwd <- 0.0011493379 # 20% genome-wide significance level
sig.bwd <- 0.0002284465 # 5% genome-wide significance level
remim.mod <- remim(data = data, pheno.col = 1, w.size = 15, sig.fwd = sig.fwd, sig.bwd = sig.bwd, d.sint = 1.5, n.clusters = 1)
print(remim.mod)
plot_profile(data = data, model = remim.mod, grid = TRUE)
plot_sint(data = data, model = remim.mod)

## Fit final QTL model
fitted.mod <- fit_model(data = data, model = remim.mod)
summary(fitted.mod)
plot_qtl(data = data, model = remim.mod, fitted = fitted.mod)

## Estimate allele effects
est.effects <- qtl_effects(ploidy = 4, fitted = fitted.mod)
plot(est.effects, p1 = "Atlantic", p2 = "B1829-5")

## Predict QTL-based breeding values
y.hat <- breeding_values(data = data, fitted = fitted.mod)
plot(y.hat)

## Run FEIM for comparison. One needs to run permutation before proceding
sig.lod <- c(5.68, 5.78, 5.60) # 5% genome-wide significance level
feim.mod <- feim(data = data, w.size = 15, sig.lod = sig.lod)
print(feim.mod)
plot_profile(data = data, model = feim.mod, grid = TRUE)

Introduction

Quantitative trait loci (QTL) mapping studies main goal is to investigate the genetic architecture of traits of interest. Single- and multiple-QTL models have been available for inbred, diploid mapping populations for quite some time now (see @DaCostaeSilva2010 for a comprehensive review). However, only recently these methods became available for outbred, polyploid mapping populations.

The R package qtlpoly (v. 0.2.1) is an under development software to map multiple QTL in full-sib families of outcrossing, autopolyploid species [@Pereira2020]. It is based on the partition of the phenotypic variance ($\sigma^2_p$) into variance due to QTL ($\sigma^2_q$) in addition to the residual variance ($\sigma^2_e$) as follows:

$$\sigma^2_p = \sum_{q=1}^Q \boldsymbol{G}^{(i,i')}q\sigma^2_q + \sigma^2{e}$$

where $\boldsymbol{G}^{(i,i')}_q$ is the additive relationship between full-sibs $i$ and $i'$, whose computation is based on the genotype conditional probabilities of QTL $q$. See how to estimate these genotype probabilities using mappoly in @Mollinari2020.

Because we only need to estimate one parameter per QTL (the very variance component associated with it), it is relatively easy to look for additional QTL and add them to the variance component model, without ending up with an overparameterized model. A multiple-QTL model is known to have increased power when compared to a single-QTL model, with ability to detect minor or separate linked QTL [@Pereira2020].

Variance components associated with putative QTL ($\sigma^2_q$) are tested using score statistics from the R package varComp (v. 0.2-0) [@Qu2013]. Final models are fitted using residual maximum likelihood (REML) from the R package sommer (v. 3.6) [@Covarrubias-Pazaran2016]. Plots for visualizing the results are based on ggplot2 (v. 3.1.0) [@Wickham2016].

This tutorial was first developed for the Polyploid Tools Training Workshop (December 12-15, 2021), and tested in r R.version.string running on either Ubuntu 18.04.1 LTS (64-bit) or Windows 10.

Installing the `qtlpoly` package

In order to run this tutorial, you will need to install the qtlpoly package, which is available at GitHub. You can install all needed packages within R using the commands below:

install.packages("devtools")
devtools::install_url("https://cran.r-project.org/src/contrib/Archive/SPA3G/SPA3G_1.0.tar.gz")
devtools::install_url("https://cran.r-project.org/src/contrib/Archive/varComp/varComp_0.2-0.tar.gz")
devtools::install_version("sommer", version = "3.6", repos = "http://cran.us.r-project.org", upgrade = FALSE)
devtools::install_github("guilherme-pereira/qtlpoly", upgrade = FALSE)

If you have installed qtlpoly before, please make sure to run the last line of the installation process again, so that you will have all the last updates. Then, use the function library() -- or require() -- to load the package:

devtools::install_github("guilherme-pereira/qtlpoly", upgrade = FALSE)
library(qtlpoly)

Tetraploid potato data

A cross between two potato (Solanum tuberosum, 2n = 4x = 48) cultivars, 'Atlantic' $\times$ 'B1829-5', resulted in 156 full-sibs. The population has been phenotyped (4-year evaluation) and genotyped (8k SNP array), and analyses have been performed to call SNP dosage, build a genetic map and map QTL [@Pereira2020a].

For brevity's sake, we have selected three phenotypes (foliage maturity evaluated in years 2007, 2008 and 2014, i.e. FM07, FM08 and FM14) and three linkage groups (LGs, namely 1, 5 and 7) for this demo. Foliage maturity is measured as the "area under the curve" of foliage color along the plant cycle. All analyses can be found at this GitHub page, though.

Use the function data() to preload the genotype probabilities computed in the LGs 1, 5 and 7 (now called 1, 2 and 3, respectively) as well as the phenotypic data:

data("genoprob4x")
data("pheno4x")

The genoprob4x object contains the genotype probability along the LGs for each individual, reflecting all recombination events. For example, individuals 1 and 2 show the following patterns of genotype probabilities, and so will every other individual:

par(mfrow = c(3, 2), pty = "s", mar=c(1,1,1,1)) 
for(i in 1:3) {
  image(t(genoprob4x[[i]]$probs[,,1]), axes=FALSE, ylab = paste("LG",i), main="Ind. 1")
  image(t(genoprob4x[[i]]$probs[,,2]), axes=FALSE, ylab = paste("LG",i), main="Ind. 2")
}

The pheno4x object contains the phenotypic values of foliage maturity for years 2007, 2008 and 2014:

library(ggplot2)
ggplot() + geom_boxplot(data=stack(as.data.frame(pheno4x)), aes(x=ind, y=values, color=ind)) + xlab("Trait") + theme(legend.position="none")

In a region where a QTL exists, the more alleles a pair of individuals share, the more their phenotypes will look alike. In fact, this is the basis of QTL detection.

Getting the data ready

The function read_data() reads both 'genoprob' and 'pheno' objects, together with other relevant information like the ploidy level (ploidy = 4), and the step size (step = 1) in which tests will be performed. A print function provides detailed information on the data:

data <- read_data(ploidy = 4, geno.prob = genoprob4x, pheno = pheno4x, step = 1) 
print(data, detailed = TRUE)

Performing QTL detection

A algorithm proposed for QTL model selection [@Kao1999] is available within the software, and it can be described as follows:

Null model: for each trait, a model starts with no QTL $$\sigma^2_p = \sigma^2_{e}$$
Forward search: QTL ($q = 1, \dots, Q$) are added one at a time, conditional to the one(s) (if any) already in the model, under a less stringent genome-wide significance level (e.g., $\alpha < 0.20$) $$\sigma^2_p = \sum_{q=1}^Q \boldsymbol{G}^{(i,i')}q \sigma^2_q + \sigma^2{e}$$
Model optimization: each QTL $r$ is tested again conditional to the remaining one(s) in the model under a more stringent genome-wide significance level (e.g., $\alpha < 0.05$) $$\sigma^2_p = \boldsymbol{G}^{(i,i')}r \sigma^2_r + \sum{q \neq r} \boldsymbol{G}^{(i,i')}q \sigma^2_q + \sigma^2{e}$$ Steps 1 and 2 iterate until no more QTL can be added to or dropped from the model, and positions of the remaining QTL do not change. After the first model optimization round, the following forward searches use the more stringent threshold (e.g., $\alpha < 0.05$).
QTL profiling: score statistics for the whole genome are updated conditional to the final set of selected QTL.

Score-based resampling to assess genome-wide significance

Rather than guessing pointwise significance levels for declaring QTL, you can use the score-based resampling method to assess the genome-wide significance level [@Zou2004]. This method is relatively intensive, because it involves score statistics computation for every position in the map repeated 1,000 times (resampling):

data.sim <- simulate_qtl(data = data, mu = 0, h2.qtl = NULL, var.error = 1, n.sim = 1000, missing = TRUE, seed = 123) 
score.null <- null_model(data = data.sim$results, n.clusters = 6, plot = NULL)
load("D:/potato/HEREDITY/score_null.RData")
min.pvl <- numeric(length(score.null$results))
for(p in 1:length(score.null$results)) {
  min.pvl[p] <- score.null$results[[p]]$pval[which.max(score.null$results[[p]]$stat)]
}
quantile(sort(min.pvl), c(0.20, 0.05))

Therefore, we ran it in advance and learned that genome-wide significance levels of $\alpha=0.20$ and $\alpha=0.05$ match $P < r quantile(sort(min.pvl), 0.20)$ and $P < r quantile(sort(min.pvl), 0.05)$, respectively, which will be used next. Remember you are supposed to run the resampling method for all the 12 linkage groups together (for our potato example).

Searching for QTL

Now, we are to use the remim function for building our multiple QTL model. One should include the data object, the window size (w.size = 15), the value $d$ that is decreased from the log of $P$-value to compute the support interval (d.sint = 1.5 for ~95\%), and the number of cores to be used (n.clusters = 6).

In case you have computed the 'score.null' object, both forward search (sig.fwd = 0.20) and backward elimination (sig.bwd = 0.05) will reflect the desired genome-wide significance levels ($\alpha$):

remim.mod <- remim(data = data, w.size = 15, score.null = score.null, sig.fwd = 0.20, sig.bwd = 0.05, d.sint = 1.5, n.clusters = 6)

Otherwise, you can just include the pointwise significance level computed before based on our resampling method, which is the option we are going to use here:

remim.mod <- remim(data = data, w.size = 15, sig.fwd = 0.0011493379, sig.bwd = 0.0002284465, d.sint = 1.5, n.clusters = 6)

Use print() and a summary table for each trait will be shown:

print(remim.mod)

Since support intervals have been calculated, you can print them as well by specifying the sint argument properly:

print(remim.mod, sint = "lower")

print(remim.mod, sint = "upper")

Visualizing results graphically

Given the final profiled models, you can plot either individual or joint $LOP$ profiles using the function plot_profile(). Triangles show where the mapped QTL peaks are located:

plot_profile(data = data, model = remim.mod, grid = TRUE)   
plot_profile(data = data, model = remim.mod, grid = FALSE)

The argument grid organizes the multiple plots as a grid if TRUE, or superimposed profiles if FALSE.

You can visualize the QTL distributed along the linkage map, together with their support intervals using the function plot_sint():

plot_sint(data = data, model = remim.mod)

Fitting multiple QTL models

Once final models have been defined, one may use REML to estimate their parameters by using the function fit_model():

fitted.mod <- fit_model(data=data, model=remim.mod)

A summary() function shows parameter estimates together with the QTL heritabilities computed from the variance component estimates:

summary(fitted.mod)

Another way of visualization you can use is the one provided by the function plot_qtl():

plot_qtl(data = data, model = remim.mod, fitted = fitted.mod)

Dots are located on the respective LG positions of the QTL peaks. Size of the dots corresponds to the specific QTL heritability. Color of the dots corresponds to the $P$-values, and helps to identify the most or the less significant QTL.

Estimating allele effects

Additive effects contributing to the overall mean by each allele individually as well as their combinations within each parent may be computed using the function qtl_effects() as follows:

est.effects <- qtl_effects(ploidy = 4, fitted = fitted.mod)

A plot() function allows the user to visualize these contributions graphically:

plot(est.effects, p1 = "Atlantic", p2 = "B1829-5")

For each QTL, one will be able to see which alleles contribute the most to increase or decrease the phenotypic mean. For example, for QTL on LG 2, alleles b from 'Atlantic' and f and h from 'B1829-5' contributes to increasing the area under the curve and, thus, to decreasing maturity time.

Predicting breeding values

Finally, with the estimates of the final models in hands, one can use them to perform predictions of the QTL-based breeding values as follows:

y.hat <- breeding_values(data = data, fitted = fitted.mod)

A plot() function shows the distribution of the genotypic values for the population:

plot(y.hat)

This may also be interesting for those populations whose individuals have been genotyped, but not phenotyped, and you still want to consider them when selecting the best genotypes.

Running a fixed-effect QTL model for comparison

A previous fixed-effect interval mapping (here named FEIM) model has been proposed as a first approach to map QTL in autopolyploid species [@Hackett2014]. It consists of a single-QTL model, where every position is tested according to the model:

$$Y = \mu_C + \sum_{i=2}^{m} \alpha_i X_i + \sum_{i=m+2}^{2m} \alpha_i X_i$$

where $\mu_C$ is the intercept, and $\alpha_i$ and $X_i$ are the main effects and indicator variables for allele $i$, respectively, and $n$ is the ploidy level. The constraints $\alpha_{1} = 0$ and $\alpha_{m+1} = 0$ are imposed to satisfy the condition $\sum_{i=1}^m X_i = m/2$ and $\sum_{i=m+1}^{2m} X_i = m/2$, so that $\mu_C$ is a constant hard to interpret due to these constraints. Notice that the higher the ploidy level, the more effects have to be estimated, i.e. tetraploid models have six main effects, hexaploid models have 10 effects, octoploid models will have 14 effects (i.e. $2m-2$).

Under REMIM model we test the variance components. Here, the interest is to know if the average allele effects are different from zero (the null hypothesis) using likelihood-ratio tests (LRT). Commonly, the tests are presented as "logarithm of the odds" (LOD scores), where $LOD = \frac{LRT}{2 \times \log_e(10)}$.

In order to evaluate significance (declare a QTL), empirical LOD thresholds are computed for each trait using permutations as proposed by @Churchill1994a.

LOD threshold permutations

Using the same object 'data' from REMIM analyses, one can first compute the thresholds for all or specific traits. The number of simulations n.sim = 1000 can be parallelized by defining the number of cores (n.clusters = 6):

load("D:/potato/HEREDITY/feim.RData") 
perm$results <- perm$results[c(25:27)]
names(perm$results) <- c("FM07", "FM08", "FM14")
perm <- permutations(data = data, n.sim = 1000, n.clusters = 6)
print(perm)

Since it takes some time to run all permutations, we will use the information previously obtained for $\alpha = 0.05$, i.e. 95\% quantile, in the subsequent FEIM analyses. Remember you are supposed to run the permutation method for all the 12 linkage groups together (for our potato example).

Interval mapping

feim function tests every position from the specified step size from 'data' -- here, every 1 cM. Besides the sig.lod vector containing the thresholds for each trait, one needs to provide a window size (e.g., w.size = 15), which will be used to select QTL peaks within the same linkage group with a minimum distance of the given window size:

feim.mod <- feim(data = data, w.size = 15, sig.lod = c(5.68, 5.78, 5.60))

A print function shows detailed information on the detected QTL:

print(feim.mod)

Remember, in this case, one should not sum adjusted $R^2$ from the same trait, as each was obtained from a single-QTL model.

Finally, one may want to plot the profiles and compare with the profiles from REMIM:

plot_profile(data = data, model = feim.mod, grid = TRUE)

Notice that, in comparison the REMIM results, the QTL on linkage groups 1 and 3 for FM08 are missing.

Acknowledgments

This package has been developed as part of the Genomic Tools for Sweetpotato Improvement (GT4SP) project, funded by Bill \& Melinda Gates Foundation.

References

guilherme-pereira/QTLpoly documentation built on Oct. 10, 2021, 10:22 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

guilherme-pereira/QTLpoly
Random-effect multiple QTL mapping in autopolyploids

In guilherme-pereira/QTLpoly: Random-effect multiple QTL mapping in autopolyploids

Short code for demo

Introduction

Installing the `qtlpoly` package

Tetraploid potato data

Getting the data ready

Performing QTL detection

Score-based resampling to assess genome-wide significance

Searching for QTL

Visualizing results graphically

Fitting multiple QTL models

Estimating allele effects

Predicting breeding values

Running a fixed-effect QTL model for comparison

LOD threshold permutations

Interval mapping

Acknowledgments

References

R Package Documentation

Browse R Packages

We want your feedback!

guilherme-pereira/QTLpoly Random-effect multiple QTL mapping in autopolyploids

In guilherme-pereira/QTLpoly: Random-effect multiple QTL mapping in autopolyploids

Short code for demo

Introduction

Installing the qtlpoly package

Tetraploid potato data

Getting the data ready

Performing QTL detection

Score-based resampling to assess genome-wide significance

Searching for QTL

Visualizing results graphically

Fitting multiple QTL models

Estimating allele effects

Predicting breeding values

Running a fixed-effect QTL model for comparison

LOD threshold permutations

Interval mapping

Acknowledgments

References

R Package Documentation

Browse R Packages

We want your feedback!

guilherme-pereira/QTLpoly
Random-effect multiple QTL mapping in autopolyploids

Installing the `qtlpoly` package