In timflutre/rutilstimflutre: Timothee Flutre's personal R code

knitr::opts_chunk$set(echo = FALSE)

Price's equation

Motivation

The main goal of Price was to understand selection in an abstract way. This consisted in answering the following question, "how does some property change between two sets", by distinguishing the part of total change due to selection.

To evolutionary biologists, it helps (i) to rephrase the question as "how does some phenotype evolve between two populations", and (ii) to name the two sets "ancestral" and "descendant".

Yet, Price equation is fully general, e.g., differences between sets can occur through space rather than time; what matters is the mapping between both sets. Most importantly, what is meant by the "part of total change due to selection" corresponds to change in frequency that arises directly from differential contributions by ancestors.

Definitions and notations

In the ancestral set: (i.e., before selection)

each entity has a type, indexed by $i \in {1,\ldots,I}$, and a property, noted $z_i$
$q_i$ is the frequency of the $i$th type, where $\forall i, 0 < q_i \le 1$ and $\sum_i q_i = 1$
average property in the population: $\bar{z} = \sum_i q_i \, z_i$ \ \

In the descendant set: (i.e., after selection)

a type-$i$ entity of the descendant set is an entity that is mapped to one, and only one, entity of type $i$ in the ancestral set, each entity of the descendant set hence has the type of the entity of the ancestral set with which it is mapped
$q'_i$: frequency of type-$i$ entities in the descendant set, where $\forall i, 0 \le q'_i \le 1$ and $\sum_i q'_i = 1$
- "The value of $q'_i$ is not obtained from the frequency of elements with index $i$ in the descendant population but from the proportion of the descendant population that is derived from the elements with index $i$ in the parent population" (Frank, 1995)
$z'_i$: average property among the type-$i$ entities of the descendant set
$z'_i = z_i + \Delta z_i$ where $\Delta z_i$ indicates by how much descendants of a given type can differ from their ancestor
average property in the population: $\bar{z}' = \sum_i q'_i \, z'_i$ \ \

Fitness: contribution to the descendant set from type $i$ in the ancestral set

absolute fitness: $W_i$
- average absolute fitness: $\bar{W} = \sum_i q_i \, W_i$
$q'_i := (W_i / \bar{W}) \, q_i$
relative fitness: $w_i = W_i \, / \, \bar{W}$
- average relative fitness: $\bar{w} = \sum_i q_i \, w_i = 1$
$\Rightarrow q'_i = w_i \, q_i$

Illustration

illustratePriceEq <- function(step=3){
  if(requireNamespace("plotrix", quietly=TRUE)){
    suppressPackageStartupMessages(library(plotrix))
    par(mar=c(0,0,0,0))
    plot(x=-10:10, y=-10:10, ylim=c(-3.5,4.5),
         type="n", xlab="", ylab="", axes=FALSE)
    abline(v=0, lty=2, cex=2)

    ## first set:
    draw.ellipse(x=-5, y=0, a=4.7, b=3)
    text(x=-5, y=3.5, labels="first set", cex=2)

    draw.circle(x=-7, y=1.8, radius=0.8)
    text(x=-7, y=1.8, labels="1", cex=2, font=2)

    draw.circle(x=-3.5, y=1.8, radius=0.8, col="royalblue4")
    text(x=-3.5, y=1.8, labels="2", cex=2, font=2, col="white")

    draw.circle(x=-5, y=0, radius=0.8, col="royalblue2")
    text(x=-5, y=0, labels="3", cex=2, font=2, col="white")

    draw.circle(x=-7, y=-1.8, radius=0.8, col="skyblue1")
    text(x=-7, y=-1.8, labels="4", cex=2, font=2)

    draw.circle(x=-3.5, y=-1.8, radius=0.8, col="navy")
    text(x=-3.5, y=-1.8, labels="5", cex=2, font=2, col="white")

    ## second set:
    draw.ellipse(x=5, y=0, a=4.7, b=3)
    text(x=5, y=3.5, labels="second set", cex=2)
    if(step >= 2){

      draw.circle(x=3.5, y=2, radius=0.8, col="royalblue4")
      if(step >= 3){
        text(x=3.5, y=2, labels="2", cex=2, font=2, col="white")
        arrows(x0=-3.5+0.8, y0=1.8, x1=3.5-0.8, y1=2, lwd=3, col="red")
      }

      draw.circle(x=7, y=1.2, radius=0.8, col="royalblue3")
      if(step >= 3){
        text(x=7, y=1.2, labels="2", cex=2, font=2, col="white")
        arrows(x0=-3.5+0.8, y0=1.8, x1=7-0.8, y1=1.2, lwd=3, col="red")
      }

      draw.circle(x=5, y=0, radius=0.8, col="royalblue1")
      if(step >= 3){
        text(x=5, y=0, labels="3", cex=2, font=2, col="white")
        arrows(x0=-5+0.8, y0=0, x1=5-0.8, y1=0, lwd=3, col="red")
      }

      draw.circle(x=3.5, y=-1.2, radius=0.8, col="skyblue3")
      if(step >= 3){
        text(x=3.5, y=-1.2, labels="3", cex=2, font=2, col="white")
        arrows(x0=-5+0.8, y0=0, x1=3.5-0.8, y1=-1.2, lwd=3, col="red")
      }

      draw.circle(x=6, y=-2.1, radius=0.8, col="navy")
      if(step >= 3){
        text(x=6, y=-2.1, labels="5", cex=2, font=2, col="white")
        arrows(x0=-3.5+0.8, y0=-1.8, x1=6-0.8, y1=-2.1, lwd=3, col="red")
      }
    }
  }
}

message ("Before selection:")
illustratePriceEq(1)

\ \

## illustratePriceEq(2)

\ \

message ("After selection:")
illustratePriceEq(3)

Derivation

Goal: express the total change in average property in terms of frequency change (selection) and property change

Here is the first, straight-forward way of deriving Price's equation: \begin{align} \Delta \bar{z} &= \bar{z}' - \bar{z} \ &= \sum_i q'_i \, z'_i \; - \; \sum_i q_i \, z_i \ &= \sum_i (q'_i - q_i) \, z_i \; + \; \sum_i q'_i \, z'_i \; - \; \sum_i q'_i \, z_i \ &= \sum_i (\Delta q_i) \, z_i \; + \; \sum_i q'_i \, (\Delta z_i) \end{align}

But it is more commonly expressed with probabilistic notions:

expectation of a random variable: $\text{E}(X) = \sum_i p_i X_i$ (if $X$ is discrete and $p_i = \text{Pr}[X = X_i]$)
expectation of a product of two random variables: $\text{E}(X Y) = \text{E}(X) \text{E}(Y) + Cov(X, Y)$.

Let us reformulate the first term:

$\sum_i (\Delta q_i) \, z_i = \sum_i q_i \, (w_i - 1) \, z_i$

Considering the $w_i$'s (respectively the $z_i$'s) as outcomes of a random variable $w$ (resp., $z$) and recognizing that $\text{E}(w - 1) = 0$ leads to $\sum_i (\Delta q_i) \, z_i = Cov(w, z)$.

Similarly: $\sum_i q'_i \, (\Delta z_i) = \sum_i q_i w_i \, (\Delta z_i) = \text{E}(w \, \Delta z)$

Here is the probabilistic way of writing down Price's equation, often found in articles and textbooks:

\begin{align} \Delta \bar{z} = Cov(w, z) \, + \, \text{E}(w \, \Delta z) \end{align}

Or equivalently using the absolute fitness:

\begin{align} \Delta \bar{z} = \frac{1}{\bar{W}} \, \left( Cov(W, z) \, + \, \text{E}(W \, \Delta z) \right) \end{align}

As written by Gardner (2020), Price equation highlights the logic of selection which four ingredients are the arena of selection ($I$), the unit of selection ($i$), the character under selection ($z$) and the target of selection ($w$).

Reminder: in the simple linear regression $y_i = \mu + \beta x_i + \epsilon_i$ with $\epsilon_i \sim \mathcal{N}(0, \sigma^2)$, the maximum-likelihood estimator of $\beta$ is $B = \frac{Cov(y, x)}{Var(x)}$.

Gardner also recalls that if fitness $w$ is linearly dependent on $z$ with slope $\beta_{wz}$, then $Cov(w, z) = \beta_{wz} \, Var(z)$, which "captures Darwin's basic argument that selection occurs when there is variation in a character of interest that is correlated with fitness".

Robertson's secondary theorem

Let us assume that the ancestral set consists in potential parents and the descendant set consists in their offsprings. In quantitative genetics since Fisher, the parental phenotype $z$ is classically decomposed into a transmissible part noted $g_a$ (called the breeding value, equal to the additive genotypic value when additive-additive epistasis is ignored) and a non-transmissible part noted $\epsilon$, with $z = g_a + \epsilon$ and $\text{E}(\epsilon) = 0$ so that $z' = g_a$. As a result, Price's equation applied to this mapping expresses the response to selection, $R$:

\begin{align} R &= \bar{z}{\text{offsprings}} - \bar{z}{\text{potential parents}} \ &= Cov(w, g_a + \epsilon) - \text{E}(w \epsilon) \ &= Cov(w, g_a) + Cov(w, \epsilon) - Cov(w, \epsilon) \ &= Cov(w, g_a) \end{align}

It is also known as Robertson's secondary theorem.

Fisher's theorem of natural selection

Applying this to the special case where the trait is fitness itself:

\begin{align} \Delta \bar{w} &= Cov(w, g_{a,w}) \ &= Cov(g_{a,w} + \epsilon_w, g_{a,w}) \ &= Var(g_{a,w}) \end{align}

Or equivalently using the absolute fitness:

\begin{align} \Delta \bar{W} = \frac{1}{\bar{W}} \, Var(g_{a,W}) \end{align}

The breeder's equation

What defines a breeder is the possibility to apply selection, typically by truncation, on a population of $N$ individuals. In the framework of Price's mapping, let us define two sets, before and after selection, both made of the exact same $N$ individuals:

before selection, each individual of the parental population is a potential parent, hence $q_i = 1/N$;
however, after selection, only a subset of the $N$ individuals will actually become parents according to the potential breeding contribution $q'_i = q_i w_i$.

Importantly, because individuals are the same between both sets, their phenotype remains the same: $\Delta z = 0$.

As a result, Price's equation applied to this mapping expresses the selection differential:

\begin{align} S &= \bar{z}{\text{selected parents}} - \bar{z}{\text{potential parents}} \ &= Cov(w, z) \end{align}

A breeder is typically interested in predicting the response to selection $R$ he/she would obtain based on the selection differential $S$ he/she could apply. In this goal, let us assume that fitness $w$ is linearly dependent on $z$ so that $w = \beta_{wz} z + \delta$ where $\beta_{wz}$ denotes the linear regression coefficient and $\text{E}(\delta) = 0$. Replacing $z$ by its own decomposition (above) gives the following:

\begin{align} w &= \beta_{wz} (g_a + \epsilon) + \delta \ &= \beta_{w g_a} g_a + \gamma \end{align}

where $\beta_{w g_a} := \beta_{wz}$ and $\gamma := \beta_{wz} \epsilon + \delta$.

Importantly, let us also assume that $Cov(g_a, \gamma) = 0$, which usually is reasonable for artificial selection but much less so for natural selection.

As a result:

\begin{align} R &= Cov(w, g_a) \ &= \beta_{w g_a} Var(g_a) \ &= \beta_{wz} Var(g_a) \frac{Var(z)}{Var(z)} \ &= \frac{Var(g_a)}{Var(z)} \, Cov(w, z) \ &= h^2 \, S \end{align}

This is also known as the breeder's equation where $h^2$ is called the (narrow-sense) heritability. It was originally derived by Lush, a geneticist that made important contribution to livestock breeding.

The breeder's equation is then often written differently to ease the comparison of selection responses between traits and to highlight the factors determining the magnitude of selection response. Because $S$ is in the same unit as the phenotype $z$, it is often standardized by the phenotypic standard deviation, leading to the intensity of selection: $i = \frac{S}{\sigma_z}$. Moreover, let us write $h^2$ as $h \times h$ so that $h := \frac{\sigma_{g_a}}{\sigma_z}$. This also means that $h$ corresponds to the correlation between breeding value and phenotype: $h = \frac{Var(g_a)}{\sigma_{g_a} \sigma_z} = \frac{Cov(g_a, z)}{\sigma_{g_a} \sigma_z} = Cor(g_a,z)$.

As a result:

\begin{align} R &= i \, h^2 \, \sigma_z \ &= i \, h \, \sigma_{g_a} \ &= i \, Cor(g_a,z) \, \sigma_{g_a} \end{align}

This formula corresponds to the case where a breeder performs phenotypic selection, i.e., the selection is made using phenotypic observations, hence what counts is the accuracy with which phenotypic observations ($z$) inform about breeding values ($g_a$). More generally, a breeder applies its selection on estimated breeding values (EBVs):

[ R = i \, \rho \, \sigma_{g_a} ]

where $\rho$ is the correlation between the estimated and the true breeding values.

Lande's equation

It is also important to realize that $\frac{S}{\sigma_z^2} = \frac{Cov(w,z)}{\sigma_z^2} = \beta_{wz}$ so that $R = \beta_{wz} \, \sigma_{g_a}^2$ where $\beta_{wz}$, the slope of the linear regression of the (relative) fitness on the trait, is known as the selection gradient. Beyond artificial selection, i.e., when no human is consciously applying a selection differential, $S$ cannot be estimated, yet $\beta_{wz}$ can, simply by regressing the relative fitness on the trait. The larger the slope, the stronger the selection.

Caution though, there may well be several traits contributing to fitness, possibly with genetic covariance between them, in which case the multivariate version of the breeder's equation should be used, as formalized by Lande in 1979:

[ \overset{\rightarrow}{R} = G \, \overset{\rightarrow}{\beta} \; \Leftrightarrow \; \begin{pmatrix} R_1 \ R_2 \end{pmatrix} = \begin{pmatrix} \sigma^2_{g_a,1} & \sigma_{g_a,1-2} \ \sigma_{g_a,1-2} & \sigma^2_{g_a,2} \end{pmatrix} \begin{pmatrix} \beta_1 \ \beta_2 \end{pmatrix} ]

in the case of two traits, with the (in)famous $G$ matrix.

References

Price G. 1970. Selection and covariance. Nature. 227(5257):520–521. doi:10.1038/227520a0.
Price G. 1972. Extension of covariance selection mathematics. Annals of Human Genetics. 35(4):485–490. doi:10.1111/j.1469-1809.1957.tb01874.x.
Price G. 1995. The nature of selection. Journal of Theoretical Biology. 175(3):389–396. doi:10.1006/jtbi.1995.0149.
Fisher R. 1918. The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society Edinburgh. 52(2):399–433. doi:10.1017/S0080456800012163.
Lush JL. 1937. Animal breeding plans. Ames: Iowa State University Press.
Robertson A. 1966. A mathematical model of the culling process in dairy cattle. Animal Science. 8(1):95–108. doi:10.1017/S0003356100037752.
Lande R. 1979. Quantitative genetic analysis of multivariate evolution, applied to brain: body size allometry. Evolution. 33(1):402–416. doi:10.2307/2407630.
Frank S. 1995. George Price’s contributions to evolutionary genetics. Journal of Theoretical Biology. 175(3):373–388. doi:10.1006/jtbi.1995.0148.
Frank SA. 1997. The Price Equation, Fisher’s Fundamental Theorem, Kin Selection, and Causal Analysis. Evolution. 51(6):1712–1729. doi:10.2307/2410995.
Bijma P. 2010. Fisher’s fundamental theorem of inclusive fitness and the change in fitness due to natural selection when conspecifics interact. Journal of Evolutionary Biology. 23(1):194–206. doi:10.1111/j.1420-9101.2009.01895.x.
Frank SA. 2012. Natural selection. IV. The Price equation. Journal of Evolutionary Biology. 25(6):1002–1019. doi:10.1111/j.1420-9101.2012.02498.x.
Queller DC. 2017. Fundamental Theorems of Evolution. The American Naturalist. 189(4). doi:10.1086/690937.
Bijma P. 2020. The Price equation as a bridge between animal breeding and evolutionary biology. Phil Trans R Soc B. 375(1797):20190360. doi:10.1098/rstb.2019.0360.
Gardner A. 2020. Price’s equation made clear. Phil Trans R Soc B. 375(1797):20190361. doi:10.1098/rstb.2019.0361.