knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(predCrossVar)
The predCrossVar package contains a complete set of functions for the prediction of additive and dominance genetic variances and co-variances among full-siblings based on parents, following [@lehermeier2017; @lehermeier2017a]. predCrossVar enables the prediction of genetic variance on a multi-trait selection index. Built for diploid organisms with phased, chromosome- or linkage-group ordered biallelic marker data, and a centimorgan-scale genetic map.
Summarize formula for predicting genetic variance in crosses. Thanks to those whose work I have learned from and built upon.
Lehermeier
Allier UCPC
Neyhart
Bijma et al.
Lynch & Walsh 1998
Also in silico as in PopVar
Wolfe et al. Genomic mate selection MS
At this early development stage, package is unforgiving. There are not many arguments with defaults. Input formats are pretty specific, but well defined.
Especially in the case of many markers and using parallel processing, memory consumption can be potentially very large and needs to be considered. Hopefully there is sig. room for improvement in this area.
I simulated 100 diploid individuals with 2 chromosomes, 10 QTL each, using the AlphaSimR library. Two correlated traits with additive and dominance effects. Main use is to exemplify the input formats and usage of predCrossVar functions.
To predict cross variance, 3 types of input are required:
We start with the included example simulation data. 100 individuals, 2 chromosomes with 10 QTL/chr.
The usual starting place for recombination frequency is a centimorgan-scale genetic map.
data(genmap) str(genmap)
# utility function to compute matrix of pairwise recombination frequencies. recombFreqMat<-genmap2recombfreq(genmap,nChr = 2)
dim(recombFreqMat)
In practice, these matrices get big as they have dimension $N_{snp} \times N_{snp}$. It is recommended to precompute and store the recombFreqMat
.
In predicting genetic variance in the $F_1$ cross, the recombFreqMat
the matrix $1-(2 \times recombFreqMat)$ is computed. To avoid having to do that separately for each family being predicted, I designed the prediction functions to require $1-(2 \times recombFreqMat)$ as input, to save a great deal of computation. Therefore pre-compute it and store that on disk. Probably this can be improved on!
recombFreqMat<-1-(2*recombFreqMat) recombFreqMat[1:5,1:5]
The example dataset haplos contains 2 chrom. with 10 loci each (columns) for 100 individuals (200 rows).
data("haploMat") dim(haploMat)
haploMat[1:5,1:10]
The example data come in the form of an list of row-matrices, which is useful for multivariate analyses we will get into later. There is also an example DomEffectsList available to load later.
data("AddEffectsList") str(AddEffectsList)
The first set of functions we will use are simplest. They only handle single traits. No covariance predictions.
To start, predict a single cross between parents "1" and "2"
# For the univariate case, input effects should be a column matrix. addEffectsT1<-t(AddEffectsList$Trait1) addEffectsT2<-t(AddEffectsList$Trait2) dim(addEffectsT1)
# Trait 1 predCrossVarA(sireID = "1", damID = "2", addEffects = addEffectsT1, haploMat = haploMat, recombFreqMat = recombFreqMat)
# Trait 2 predCrossVarA(sireID = "1", damID = "2", addEffects = addEffectsT2, haploMat = haploMat, recombFreqMat = recombFreqMat)
Multiple families can be predicted, with an option for parallel processing across families with the runCrossVarPredsA()
(additive-only) and runCrossVarPredsAD()
(additive-plus-dominance) functions.
For example, we will pick three parents of interest.
There is a helper function crosses2predict()
to make a list of all pairwise matings given an input vecot of parent IDs. Includes self-crosses and the upper-triangle of a mating matrix (i.e. no reciprocal crosses).
ped2predict<-crosses2predict(parents = c("1","2","11")) ped2predict
# Trait 1 - serial (DEFAULT, ncores = 1) runCrossVarPredsA(ped = ped2predict, addEffects = addEffectsT1, haploMat = haploMat, recombFreqMat = recombFreqMat)
# Trait 1 - parallel (ncores = 6) runCrossVarPredsA(ped = ped2predict, addEffects = addEffectsT1, haploMat = haploMat, recombFreqMat = recombFreqMat, ncores = 6)
# Trait 2 - parallel (ncores = 6) runCrossVarPredsA(ped = ped2predict, addEffects = addEffectsT2, haploMat = haploMat, recombFreqMat = recombFreqMat, ncores = 6)
data("DomEffectsList") domEffectsT1<-t(DomEffectsList$Trait1) domEffectsT2<-t(DomEffectsList$Trait2)
# Trait 1 runCrossVarPredsAD(ped = ped2predict, addEffects = addEffectsT1, domEffects = domEffectsT1, haploMat = haploMat, recombFreqMat = recombFreqMat, ncores = 6)
# Trait 2 runCrossVarPredsAD(ped = ped2predict, addEffects = addEffectsT1, domEffects = domEffectsT1, haploMat = haploMat, recombFreqMat = recombFreqMat, ncores = 6)
(including covariances)
predOneCrossVarA()
-> predCrossVarsA()
-> runMtCrossVarPredsA()
runMtCrossVarPredsA()
and runMtCrossVarPredsAD()
should supersede in function all the others.
NOTES:
Arguments names are pretty opinionated and long, although not without good intention. Might be worth simplifying.
predOneCrossVarA()
: postMeanAlleleSubEffects=
could just be addEffectsLists=
mtpred<-runMtCrossVarPredsAD(CrossesToPredict = ped2predict, AddEffectList = AddEffectsList, DomEffectList = DomEffectsList, haploMat = haploMat, recombFreqMat = recombFreqMat, ncores = 6)
The results are a nested list structure. Let's unpack them...
library(tidyverse); library(magrittr)
mtpred
The list element "varcovars"
mtpred$varcovars %>% tidyr::unnest_wider(varcomps)
Unpack varcovars to get at the variance and covariance predictions for additive and dominances.
Variances are in column VPM. Output is in long-form.
mtpred$varcovars %>% tidyr::unnest_wider(varcomps) %>% dplyr::select(-totcomputetime) %>% tidyr::unnest(predictedfamvars) %>% tidyr::unnest(predVars) %>% head
Cut out the extra columns and spread the additive-vs-dominance across columns, just to tidy up
mtpred$varcovars %>% tidyr::unnest_wider(varcomps) %>% dplyr::select(-totcomputetime) %>% tidyr::unnest(predictedfamvars) %>% tidyr::unnest(predVars) %>% dplyr::select(-PMV,-Nsegsnps,-totcomputetime) %>% spread(VarComp,VPM) %>% rmarkdown::paged_table()
Note that variance results Trait1==Trait2
should match the single-trait results and co-variance predictions are now included.
Lastly, let's quickly verify that the additive-only function works and that the runMt___
functions won't errory if we give just one cross / trait.
mtpred_A<-runMtCrossVarPredsA(CrossesToPredict = ped2predict, AlleleSubEffectList = AddEffectsList, haploMat = haploMat, recombFreqMat = recombFreqMat, ncores = 6)
mtpred_A$varcovars %>% tidyr::unnest_wider(varcomps) %>% dplyr::select(-totcomputetime) %>% tidyr::unnest(predictedfamvars) %>% tidyr::unnest(predVars) %>% rmarkdown::paged_table()
st_addefflist<-AddEffectsList st_addefflist$Trait2<-NULL stpred_sc<-runMtCrossVarPredsA(CrossesToPredict = ped2predict[1,], AlleleSubEffectList = st_addefflist, haploMat = haploMat, recombFreqMat = recombFreqMat, ncores = 6) stpred_sc$varcovars %>% tidyr::unnest_wider(varcomps) %>% dplyr::select(-totcomputetime) %>% tidyr::unnest(predictedfamvars) %>% tidyr::unnest(predVars) %>% rmarkdown::paged_table()
data("AddEffectsList") str(AddEffectsList)
List of allele substitution effects.
Each list element is a matrix. One matrix per trait.
Each element of the list is named with a string identifying the trait and the colnames
of each matrix are labeled with snpIDs.
Dimensions of each matrix of marker effects
predType=="VPM"
should be used and each matrix of the AlleleSubEffectList
will have only one row.predType=="PMV"
when computing the predicted variance across a sample of marker effects, e.g. the thinned MCMC samples, usually stored on disk.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.