knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In crop and animal breeding programs, selection is rarely performed in a single stage. Breeders usually evaluate multiple traits across different stages of testing, successively discarding inferior genotypes and advancing superior ones. This multistage approach requires selection indices to account for the changes in variances and covariances induced by selection at prior stages.
Chapter 9 of the selection.index package focuses on the mathematical formulation and practical application of Multistage Linear Selection Indices. We introduce indices that properly adjust for prior selection using the Cochran (1951) and Cunningham (1975) method, calculating corrected covariance matrices to appropriately predict subsequent selection responses. This vignette will demonstrate six multistage indices utilizing phenotypic and genomic estimated breeding values (GEBVs).
We will use the synthetic maize phenotypic and genotypic datasets (maize_pheno and maize_geno) to illustrate these complex functions. Let us first prepare the covariance matrices.
For our examples, we will evaluate 3 quantitative traits from the dataset: Yield, PlantHeight, and DaysToMaturity. We assume that Stage 1 selection evaluates the first two traits (Yield, PlantHeight), and Stage 2 evaluates all 3 traits (adding DaysToMaturity).
library(selection.index) # Estimate phenotypic and genotypic covariance matrices for the 3 traits # The traits are Yield, PlantHeight, DaysToMaturity traits <- c("Yield", "PlantHeight", "DaysToMaturity") pmat <- phen_varcov(maize_pheno[, traits], maize_pheno$Environment, maize_pheno$Genotype) gmat <- gen_varcov(maize_pheno[, traits], maize_pheno$Environment, maize_pheno$Genotype) # Matrix limits for Stage 1 (Traits 1 to 2) P1 <- pmat[1:2, 1:2] G1 <- gmat[1:2, 1:2] # Complete Matrices for Stage 2 P <- pmat C <- gmat # Economic weights for the 3 traits weights <- c(10, -5, -2)
The Multistage Linear Phenotypic Selection Index accounts for changes in phenotypic ($\mathbf{P}$) and genotypic ($\mathbf{C}$) covariance matrices due to previous selection cycles.
At Stage 1, the index coefficients are computed just as in the standard Smith-Hazel index: $$ \mathbf{b}_1 = \mathbf{P}_1^{-1}\mathbf{G}_1\mathbf{w}_1 $$ where $\mathbf{w}_1$ contains the economic weights for Stage 1 traits.
At Stage 2, the coefficients for the entire set of traits are: $$ \mathbf{b}_2 = \mathbf{P}^{-1}\mathbf{C}\mathbf{w} $$
However, due to selection at stage 1, the covariances $\mathbf{P}$ and $\mathbf{C}$ are adjusted for stage 2 evaluation: $$ \mathbf{P}^ = \mathbf{P} - u \frac{Cov(\mathbf{y},\mathbf{x}_1)\mathbf{b}_1\mathbf{b}_1'Cov(\mathbf{x}_1,\mathbf{y})}{\mathbf{b}_1'\mathbf{P}_1\mathbf{b}_1} $$ $$ \mathbf{C}^ = \mathbf{C} - u \frac{\mathbf{G}_1'\mathbf{b}_1\mathbf{b}_1'\mathbf{G}_1}{\mathbf{b}_1'\mathbf{P}_1\mathbf{b}_1} $$ where $u = k_1(k_1 - \tau)$ calculates the effect of selection based on the standardized truncation point $\tau$ and selection intensity $k_1$.
The function mlpsi simultaneously performs adjustments and metric estimations for both stages.
# We apply a selection proportion of 10% (0.10) per stage. mlpsi_res <- mlpsi( P1 = P1, P = P, G1 = G1, C = C, wmat = weights, selection_proportion = 0.1 ) # Stage 1 metrics mlpsi_res$summary_stage1 # Stage 2 metrics mlpsi_res$summary_stage2
The MRLPSI method applies when the breeder aims to maintain one or more quantitative traits without change over the multistage evaluation (e.g., maintaining constant PlantHeight while optimizing other variables).
The restricted coefficient vectors for Stage 1 and Stage 2 are defined as: $$ \mathbf{b}{R_1} = \mathbf{K}_1 \mathbf{b}_1 $$ $$ \mathbf{b}{R_2} = \mathbf{K}_2 \mathbf{b}_2 $$
Where $\mathbf{K}_1 = \mathbf{I}_1 - \mathbf{Q}_1$ and $\mathbf{K}_2 = \mathbf{I}_2 - \mathbf{Q}_2$ are matrices that impose zero genetic gain vectors, derived from the constraint matrices $\mathbf{C}_1$ and $\mathbf{C}_2$.
# We constrain PlantHeight (Trait 2) at Stage 1 C1 <- matrix(0, nrow = 2, ncol = 1) C1[2, 1] <- 1 # We constrain PlantHeight (Trait 2) at Stage 2 C2 <- matrix(0, nrow = 3, ncol = 1) C2[2, 1] <- 1 mrlpsi_res <- mrlpsi( P1 = P1, P = P, G1 = G1, C = C, wmat = weights, C1 = C1, C2 = C2, selection_proportion = 0.1 ) # Observe that Expected Gain (E) for PlantHeight is approximately 0 mrlpsi_res$summary_stage1
Unlike MRLPSI which imposes zero genetic gain bounds, MPPG-LPSI forces proportional changes mapped by the $\mathbf{d}_1$ and $\mathbf{d}_2$ restricted-difference vectors. At Stage $i$, the difference vector creates an updated target trajectory.
# Target specific proportional gains d1 <- c(2, 1) # Yield gains twice as much as PlantHeight at stage 1 d2 <- c(3, 1, 0.5) # Desired proportions at stage 2 mppg_res <- mppg_lpsi( P1 = P1, P = P, G1 = G1, C = C, wmat = weights, d1 = d1, d2 = d2, selection_proportion = 0.1 ) # Observe the Expected Gain (E) in the resulting summary stats aligns with d1 proportions mppg_res$summary_stage1
In modern breeding, Genomic Estimated Breeding Values (GEBVs) computed from whole-genome markers accelerate cyclical selection. For the Multi-Stage Genomic framework, we replace phenotypic estimators with GEBV matrix derivations: $\mathbf{\Gamma}$ (GEBV variance-covariance) replaces $\mathbf{P}$. $\mathbf{A}$ (Covariance between GEBVs and true BVs) provides genomic mappings.
For illustrative purposes, we mock simulate the arrays via a pseudo-reliability scaling of the genetic matrices:
set.seed(42) reliability <- 0.7 # Simulated genomic prediction reliability Gamma1 <- reliability * G1 Gamma <- reliability * C A1 <- reliability * G1 A <- C[, 1:2] # n x n1 covariance mapping
mlgsi_res <- mlgsi( Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = C, G1 = G1, P1 = P1, wmat = weights, selection_proportion = 0.1 ) mlgsi_res$summary_stage1
Similarly, traits can be biologically constrained in multiple genome-assisted breeding cycles.
mrlgsi_res <- mrlgsi( Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = C, G1 = G1, P1 = P1, wmat = weights, C1 = C1, C2 = C2, selection_proportion = 0.1 ) mrlgsi_res$summary_stage2
The procedure calculates predetermined gains over multiple cycles exclusively utilizing whole-genome predictions.
mppg_lgsi_res <- mppg_lgsi( Gamma1 = Gamma1, Gamma = Gamma, A1 = A1, A = A, C = C, G1 = G1, P1 = P1, wmat = weights, d1 = d1, d2 = d2, selection_proportion = 0.1 ) mppg_lgsi_res$summary_stage1
For all the multistage indices above (Phenotypic and Genomic), we evaluate statistical properties to compare efficiency.
The accuracy (or correlation between the index and true breeding value) indicates the efficiency of the index: $$ \rho_{H} = \frac{\sigma_{H, I}}{\sigma_H \sigma_I} = \sqrt{ \frac{\mathbf{b}'\mathbf{P}\mathbf{b}}{\mathbf{w}'\mathbf{C}\mathbf{w}} } $$ where $\mathbf{P}$ and $\mathbf{C}$ are substituted by their adjusted equivalents at Stage 2 ($\mathbf{P}^$ and $\mathbf{C}^$ or $\mathbf{\Gamma}^*$).
The overall selection response generated by the index evaluates the genetic superiority: $$ R = k \sigma_{I} = k \sqrt{\mathbf{b}'\mathbf{P}\mathbf{b}} $$ where $k$ is the selection intensity.
The expected genetic gain per individual trait is given by the vector: $$ \mathbf{E} = k \frac{\mathbf{G}'\mathbf{b}}{\sigma_I} = k \frac{\mathbf{G}'\mathbf{b}}{\sqrt{\mathbf{b}'\mathbf{P}\mathbf{b}}} $$ For the Multistage Genomic Indices, $\mathbf{G}'$ is replaced by the marker association mapping matrix $\mathbf{A}'$.
Multistage Selection Indices effectively manage breeding resources by filtering inferior variants sequentially while accurately recalibrating covariance parameters using genomic and phenotypic variables across progressive evaluations. These analytical tools prevent severe variance distortion over breeding cycles.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.