README.md

Genomic Mate Selection in R

Welcome to the R package that provides support for genomic-enabled prediction and mate selection for plant and animal breeding: library(genomicMateSelectR).

The primary functions of genomicMateSelectR predict the means and variances in performance among progeny of crosses based on parent data in order to support selection of mates in a breeding program. Supports diploid organisms with phased, chromosome- or linkage-group ordered biallelic marker data, and a centimorgan-scale genetic map. Additional functions automate cross-validation estimation of prediction accuracy, and more.

The package includes what might be too many extra functions. Indeed, it spans the entire pipeline used for genomic mate selection for the NextGen Cassava Breeding programs using data downloads from Cassavabase. Functions to: automate cross-validation procedures, compute accuracy on a selection index, make predictions of individual and cross performances on a multi-trait selection index. Prediction models including additive-effects only ("A"), additive-plus-dominance ("AD") and a directional dominance model ("DirDom"). Functions used in cleaning and curating breeding pipeline field data plus handling and imputing genomics data are also included, but most users are not likely to find these of interest. Not everything has been equally tested or documented at this stage.

Installation

You can install genomicMateSelectR package from my GitHub with:

devtools::install_github("wolfemd/genomicMateSelectR", ref = 'master') 

Get Started

CHECK OUT THE NEW VIGNETTES!

  1. Getting starting predicting crosses
  2. Genomic (cross) predictions with non-additive effects

More to come!

Feature highlights

Core function overview

| Genomic mate selection functions | Top level functions to predict the performance of individual genotypes and the usefulness of potential crosses. | |--------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | runGenomicPredictions() | Run GBLUP model using sommer::mmer, potentially on multiple traits. Returns genomic BLUPs (GEBV and GETGV). If requested, returns backsolved marker effects (equivalent to ridge regression / SNP-BLUP). | | predictCrosses() | Predict potentially for multiple traits, the means, variances and trait-trait covariances in a set of user-requested crosses-to-evaluate. Output enables easy ranking of potential crosses. Potentially computes the usefulness criteria, $UC_{parent}$ and $UC_{variety}$. Provides users the option to predict cross usefulness (means and variances) on a linear multi-trait selection index, taking into account trait-trait covariances within each cross, using a set of user-supplied weights. Utilizes the functions predCrossVars() and predCrossMeans() under-the-hood. This function is designed to work with runGenomicPredictions() , taking SNP-effects matrices output from that function. |

| Cross-validation functions | Functions to automate cross-validation | |--------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------| | runParentWiseCrossVal() | Assess the accuracy of predicted previously unobserved crosses. | | runCrossVal() | Run k-fold cross-validation and assess the accuracy of predicted previously unobserved genotypes (individuals) based on the available training data. |

| Cross-prediction functions | Functions that predict cross means and variances | |--------------------------------|--------------------------------------------------| | predCrossVars() | Predict cross variances and covariances | | predCrossMeans() | Predict cross means |

NextGen Cassava GS pipeline functions

In addition, a host of functions developed to support processing both the field trial data stored on the Cassavabase and the genotyping/genomics data (imputation, file conversions).

There are two "families" of functions distinguished in the Reference as "cassavabase_pheno_pipeline" and "imputation_functions".

Relationship to predCrossVar

library(genomicMateSelectR) descends from and extends the predCrossVar R package. library(predCrossVar) was build alongside an an initial study, which showed promising results regarding the prediction of genetic variance in cassava crosses. Subsequent to that initial study, the code was completed and improved. The considerable subsequent analyses leading to the functions collected in library(genomicMateSelectR) are completely documented here.



wolfemd/genomicMateSelectR documentation built on July 1, 2022, 10:42 p.m.