The Linear Phenotypic Selection Index Theory"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

In plant and animal breeding, quantitative traits (QTs) are expressions of genes distributed across the genome interacting with the environment. The phenotypic value of QTs ($y$) can be systematically partitioned into a genotypic component ($g$) and an environmental component ($e$):

$$ y = g + e $$

The primary goal in breeding is to maximize an individual's net genetic merit. The net genetic merit ($H$) is a linear combination of the unobservable true breeding values ($\mathbf{g}$) weighted by their respective economic values ($\mathbf{w}$):

$$ H = {\mathbf{w}}^{\prime}\mathbf{g} $$

Because the net genetic merit is unobservable in field trials, breeders construct a Linear Phenotypic Selection Index (LPSI) to predict it. The LPSI ($I$) is a linear combination of the observable and optimally weighted phenotypic trait values ($\mathbf{y}$) adjusted by index coefficients ($\mathbf{b}$):

$$ I = {\mathbf{b}}^{\prime}\mathbf{y} $$

The objective of the LPSI is to predict the net genetic merit and maximize the multi-trait selection response.

Optimizing the LPSI

To identify the optimal parents for the next selection cycle, the correlation between the net genetic merit ($H$) and the LPSI ($I$) must be maximized. The vector $\mathbf{b}$ that simultaneously minimizes the mean squared difference between $I$ and $H$ and perfectly maximizes this correlation is mathematically derived as:

$$ \mathbf{b} = {\mathbf{P}}^{-1}\mathbf{Gw} $$

where: $\mathbf{P}$ is the phenotypic variance-covariance matrix. $\mathbf{G}$ is the genotypic variance-covariance matrix. * $\mathbf{w}$ is the vector of economic weights defining relative trait importance.

Once these optimal coefficients are derived, we can evaluate two fundamental parameters:

  1. The Maximized Selection Response ($R_I$): The expected mean improvement in the net genetic merit due to indirect selection on the index. $$ {R}_I = {k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}} $$

  2. The Expected Genetic Gain Per Trait ($\mathbf{E}$): The multi-trait selection response broken down per individual trait. $$ \mathbf{E} = {k}_I\frac{\mathbf{Gb}}{\sigma_I} $$

where $k_I$ is the standardized selection intensity and $\sigma_I$ is the standard deviation of the index score variance.

Practical Implementation in R

We can seamlessly translate this text theory into rigorous statistical practice using the selection.index package. We will utilize the built-in synthetic datasets: maize_pheno (containing multi-environment phenotypic records for 100 genotypes) and maize_geno (500 SNP markers).

1. Estimating Covariance Matrices

First, we estimate the genotypic ($\mathbf{G}$) and phenotypic ($\mathbf{P}$) variance-covariance matrices from our raw phenotypic dataset.

library(selection.index)

# Load the synthetic phenotypic multi-environment dataset
data("maize_pheno")

# In maize_pheno: Traits are columns 4:6.
# Genotypes are in column 1, and Block/Replication is in column 3.
gmat <- gen_varcov(data = maize_pheno[, 4:6], genotypes = maize_pheno[, 1], replication = maize_pheno[, 3])
pmat <- phen_varcov(data = maize_pheno[, 4:6], genotypes = maize_pheno[, 1], replication = maize_pheno[, 3])

2. Defining Economic Weights

Next, we establish the relative economic priority of each trait. Economic weights ($\mathbf{w}$) explicitly define our strategic breeding objectives.

# Define the economic weights for the 3 continuous traits
# (e.g., Yield, PlantHeight, DaysToMaturity)
weights <- c(10, -5, -5)

3. Calculating the LPSI

With the covariance matrices and economic weights specified, we integrate them into the primary lpsi() function, which evaluates the combinatorial multi-trait selection indices efficiently.

# Calculate the Optimal Combinatorial Linear Phenotypic Selection Index (LPSI)
index_results <- lpsi(
  ncomb = 3,
  pmat = pmat,
  gmat = gmat,
  wmat = as.matrix(weights),
  wcol = 1
)

4. Evaluating Outcomes and Selecting Genotypes

Finally, we evaluate the theoretical gains. The lpsi() function returns a structured data frame containing the theoretical selection response ($R_I$) and other parameter estimates for all requested trait combinations.

# View the top combinatorial indices, including their selection response (R_A)
head(index_results)

# Extract the phenotypic selection scores to strategically rank the parental candidates
# using the top evaluated combinatorial index
scores <- predict_selection_score(
  index_results,
  data = maize_pheno[, 4:6],
  genotypes = maize_pheno[, 1]
)

# View the top performing candidates designated for the next breeding cycle
head(scores)

5. Extension: Linear Marker Selection Index

The classical linear selection index theories seamlessly extend to marker-assisted genomic selection. If you have genome-wide marker profiles for your genotypes, you can incorporate them to estimate the Linear Marker Selection Index (LMSI).

# Load the associated synthetic genomic dataset (500 SNPs for the 100 genotypes)
data("maize_geno")

# Calculate the marker-assisted index combining our matrices and raw SNP profiles
marker_index_results <- lmsi(
  pmat = pmat,
  gmat = gmat,
  marker_scores = maize_geno,
  wmat = weights
)

summary(marker_index_results)

6. The Base Index and Index Efficiency

In scenarios where the phenotypic ($\mathbf{P}$) and genotypic ($\mathbf{G}$) matrices are poorly estimated (e.g., due to limited data), the true optimal coefficients ($\mathbf{b}$) can be systematically biased. The Base Index provides a robust, non-optimized alternative where coefficients are set strictly equal to the fixed economic weights ($I_B = \mathbf{w}'\mathbf{y}$).

# Calculate the Base Index and automatically compare its efficiency to the LPSI
base_results <- base_index(
  pmat = pmat,
  gmat = gmat,
  wmat = weights,
  compare_to_lpsi = TRUE
)

# Observe the expected genetic gains and efficiency comparison
base_results$summary

7. Heritability of the LPSI

The theory demonstrates that the correlation between the net genetic merit ($H$) and the expected index ($I$) differs from the traditional index heritability mathematically ($h^2_I \neq \rho^2_{HI}$). The lpsi() function intrinsically estimates both of these fundamental statistics:

# Extract the top combinatorial index results
top_index <- index_results[1, ]

# h^2_I: Heritability of the optimal index
top_index$hI2

# \rho_HI: Correlation between the LPSI and the true underlying Net Genetic Merit
top_index$rHI


Try the selection.index package in your browser

Any scripts or data that you put into this service are public.

selection.index documentation built on March 9, 2026, 1:06 a.m.