In wolfemd/predCrossVar: Prediction of family genomic variances and covariances

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(predCrossVar)

Intro

The predCrossVar package contains a complete set of functions for the prediction of additive and dominance genetic variances and co-variances among full-siblings based on parents, following [@lehermeier2017; @lehermeier2017a]. predCrossVar enables the prediction of genetic variance on a multi-trait selection index. Built for diploid organisms with phased, chromosome- or linkage-group ordered biallelic marker data, and a centimorgan-scale genetic map.

Predicting genetic variance

Summarize formula for predicting genetic variance in crosses. Thanks to those whose work I have learned from and built upon.

Lehermeier
Allier UCPC
Neyhart
Bijma et al.
Lynch & Walsh 1998
Also in silico as in PopVar
Wolfe et al. Genomic mate selection MS

Notice: Package dev. status

At this early development stage, package is unforgiving. There are not many arguments with defaults. Input formats are pretty specific, but well defined.

Especially in the case of many markers and using parallel processing, memory consumption can be potentially very large and needs to be considered. Hopefully there is sig. room for improvement in this area.

Example data

I simulated 100 diploid individuals with 2 chromosomes, 10 QTL each, using the AlphaSimR library. Two correlated traits with additive and dominance effects. Main use is to exemplify the input formats and usage of predCrossVar functions.

Inputs

To predict cross variance, 3 types of input are required:

Phased Parental Haplotypes
Genetic Map (or pairwise-recombination frequencies)
Marker Effects

We start with the included example simulation data. 100 individuals, 2 chromosomes with 10 QTL/chr.

Genetic map

The usual starting place for recombination frequency is a centimorgan-scale genetic map.

data(genmap)
str(genmap)

# utility function to compute matrix of pairwise recombination frequencies. 
recombFreqMat<-genmap2recombfreq(genmap,nChr = 2)

dim(recombFreqMat)

In practice, these matrices get big as they have dimension $N_{snp} \times N_{snp}$. It is recommended to precompute and store the recombFreqMat.

In predicting genetic variance in the $F_1$ cross, the recombFreqMat the matrix $1-(2 \times recombFreqMat)$ is computed. To avoid having to do that separately for each family being predicted, I designed the prediction functions to require $1-(2 \times recombFreqMat)$ as input, to save a great deal of computation. Therefore pre-compute it and store that on disk. Probably this can be improved on!

recombFreqMat<-1-(2*recombFreqMat)
recombFreqMat[1:5,1:5]

Phased parental haplotypes

The example dataset haplos contains 2 chrom. with 10 loci each (columns) for 100 individuals (200 rows).

data("haploMat")
dim(haploMat)

haploMat[1:5,1:10]

Marker effects

The example data come in the form of an list of row-matrices, which is useful for multivariate analyses we will get into later. There is also an example DomEffectsList available to load later.

data("AddEffectsList")
str(AddEffectsList)

Single-trait predictions

The first set of functions we will use are simplest. They only handle single traits. No covariance predictions.

Predict one cross

To start, predict a single cross between parents "1" and "2"

# For the univariate case, input effects should be a column matrix.
addEffectsT1<-t(AddEffectsList$Trait1)
addEffectsT2<-t(AddEffectsList$Trait2)
dim(addEffectsT1)

# Trait 1
predCrossVarA(sireID = "1", damID = "2", 
              addEffects = addEffectsT1,
              haploMat = haploMat,
              recombFreqMat = recombFreqMat)

# Trait 2
predCrossVarA(sireID = "1", damID = "2", 
              addEffects = addEffectsT2,
              haploMat = haploMat,
              recombFreqMat = recombFreqMat)

Predict several crosses

Multiple families can be predicted, with an option for parallel processing across families with the runCrossVarPredsA() (additive-only) and runCrossVarPredsAD() (additive-plus-dominance) functions.

For example, we will pick three parents of interest.

There is a helper function crosses2predict() to make a list of all pairwise matings given an input vecot of parent IDs. Includes self-crosses and the upper-triangle of a mating matrix (i.e. no reciprocal crosses).

ped2predict<-crosses2predict(parents = c("1","2","11"))
ped2predict

# Trait 1 - serial (DEFAULT, ncores = 1)
runCrossVarPredsA(ped = ped2predict,
                  addEffects = addEffectsT1,
                  haploMat = haploMat,
                  recombFreqMat = recombFreqMat)

Multi-core across families

# Trait 1 - parallel (ncores = 6)
runCrossVarPredsA(ped = ped2predict,
                  addEffects = addEffectsT1,
                  haploMat = haploMat,
                  recombFreqMat = recombFreqMat,
                  ncores = 6)

# Trait 2 - parallel (ncores = 6)
runCrossVarPredsA(ped = ped2predict,
                  addEffects = addEffectsT2,
                  haploMat = haploMat,
                  recombFreqMat = recombFreqMat,
                  ncores = 6)

Additive-plus-dominance

data("DomEffectsList")
domEffectsT1<-t(DomEffectsList$Trait1)
domEffectsT2<-t(DomEffectsList$Trait2)

# Trait 1
runCrossVarPredsAD(ped = ped2predict,
                  addEffects = addEffectsT1,
                  domEffects = domEffectsT1,
                  haploMat = haploMat,
                  recombFreqMat = recombFreqMat,
                  ncores = 6)

# Trait 2
runCrossVarPredsAD(ped = ped2predict,
                  addEffects = addEffectsT1,
                  domEffects = domEffectsT1,
                  haploMat = haploMat,
                  recombFreqMat = recombFreqMat,
                  ncores = 6)

Multiple-trait predictions

(including covariances)

predOneCrossVarA() -> predCrossVarsA() -> runMtCrossVarPredsA()

runMtCrossVarPredsA() and runMtCrossVarPredsAD() should supersede in function all the others.

NOTES:

Arguments names are pretty opinionated and long, although not without good intention. Might be worth simplifying.
- e.g. predOneCrossVarA(): postMeanAlleleSubEffects= could just be addEffectsLists=

mtpred<-runMtCrossVarPredsAD(CrossesToPredict = ped2predict,
                             AddEffectList = AddEffectsList,
                             DomEffectList = DomEffectsList,
                             haploMat = haploMat,
                             recombFreqMat = recombFreqMat, 
                             ncores = 6)

The results are a nested list structure. Let's unpack them...

library(tidyverse); library(magrittr)

mtpred

The list element "varcovars"

mtpred$varcovars %>% 
  tidyr::unnest_wider(varcomps)

Unpack varcovars to get at the variance and covariance predictions for additive and dominances.

Variances are in column VPM. Output is in long-form.

mtpred$varcovars %>% 
  tidyr::unnest_wider(varcomps) %>% 
  dplyr::select(-totcomputetime) %>% 
  tidyr::unnest(predictedfamvars) %>% 
  tidyr::unnest(predVars) %>% head

Cut out the extra columns and spread the additive-vs-dominance across columns, just to tidy up

mtpred$varcovars %>% 
  tidyr::unnest_wider(varcomps) %>% 
  dplyr::select(-totcomputetime) %>% 
  tidyr::unnest(predictedfamvars) %>% 
  tidyr::unnest(predVars) %>% 
  dplyr::select(-PMV,-Nsegsnps,-totcomputetime) %>% 
  spread(VarComp,VPM) %>% rmarkdown::paged_table()

Note that variance results Trait1==Trait2 should match the single-trait results and co-variance predictions are now included.

Lastly, let's quickly verify that the additive-only function works and that the runMt___ functions won't errory if we give just one cross / trait.

mtpred_A<-runMtCrossVarPredsA(CrossesToPredict = ped2predict,
                              AlleleSubEffectList = AddEffectsList,
                              haploMat = haploMat,
                              recombFreqMat = recombFreqMat, 
                              ncores = 6)

mtpred_A$varcovars %>% 
  tidyr::unnest_wider(varcomps) %>% 
  dplyr::select(-totcomputetime) %>% 
  tidyr::unnest(predictedfamvars) %>% 
  tidyr::unnest(predVars) %>% 
  rmarkdown::paged_table()

st_addefflist<-AddEffectsList
st_addefflist$Trait2<-NULL
stpred_sc<-runMtCrossVarPredsA(CrossesToPredict = ped2predict[1,],
                               AlleleSubEffectList = st_addefflist,
                               haploMat = haploMat,
                               recombFreqMat = recombFreqMat,
                               ncores = 6)
stpred_sc$varcovars %>% 
  tidyr::unnest_wider(varcomps) %>% 
  dplyr::select(-totcomputetime) %>% 
  tidyr::unnest(predictedfamvars) %>% 
  tidyr::unnest(predVars) %>% 
  rmarkdown::paged_table()

A note on marker effects input

data("AddEffectsList")
str(AddEffectsList)

List of allele substitution effects.
Each list element is a matrix. One matrix per trait.
Each element of the list is named with a string identifying the trait and the colnames of each matrix are labeled with snpIDs.
Dimensions of each matrix of marker effects
- $1 \times N_{snp}$: Many users will have marker effects estimated with REML or only want to use the posterior mean marker effects from a Bayesian marker-regression. For these users, the argument predType=="VPM" should be used and each matrix of the AlleleSubEffectList will have only one row.
- $\frac{N_{iter} - N_{burnIn}}{N_{thin}} \times N_{snp}$: When predicting the Posterior Mean Variance (PMV), use argument predType=="PMV" when computing the predicted variance across a sample of marker effects, e.g. the thinned MCMC samples, usually stored on disk.

References

wolfemd/predCrossVar documentation built on Dec. 21, 2020, 10:14 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

wolfemd/predCrossVar
Prediction of family genomic variances and covariances

In wolfemd/predCrossVar: Prediction of family genomic variances and covariances

Intro

Predicting genetic variance

Notice: Package dev. status

Example data

Inputs

Genetic map

Phased parental haplotypes

Marker effects

Single-trait predictions

Predict one cross

Predict several crosses

Multi-core across families

Additive-plus-dominance

Multiple-trait predictions

A note on marker effects input

References

R Package Documentation

Browse R Packages

We want your feedback!

wolfemd/predCrossVar Prediction of family genomic variances and covariances

In wolfemd/predCrossVar: Prediction of family genomic variances and covariances

Intro

Predicting genetic variance

Notice: Package dev. status

Example data

Inputs

Genetic map

Phased parental haplotypes

Marker effects

Single-trait predictions

Predict one cross

Predict several crosses

Multi-core across families

Additive-plus-dominance

Multiple-trait predictions

A note on marker effects input

References

R Package Documentation

Browse R Packages

We want your feedback!

wolfemd/predCrossVar
Prediction of family genomic variances and covariances