In MorphoFun/psa: Phenotypic Selection Analysis tools

Install psa from GitHub

require(devtools)
#install_github("MorphoFun/psa", build_vignettes = TRUE)
library(psa)
library(ggplot2)

To see all of the vignettes that are available in psa, type browseVignettes("psa").

Loading the Male Bumpus data as an example

data(BumpusMales)

Check the structure of the data

str(BumpusMales)

Check the documentation to understand what each of the 13 columns represent

?BumpusMales

Define the parameters

Fitness will be represented by the relative fitness (w), which is the absolute fitness (W) of an individual divided by the mean absolute fitness for the group.

w <- BumpusMales[,1]

The phenotypic traits are stored under z:

z <- BumpusMales[,3:11]

Since Weight is a volume, and the other traits are linear measurements, we need to adjust for the dimensionality differences by taking the cube root of Weight_g. (Alternatively, one could cube all of the other phenotypic traits except Weight_g)

z$Weight_g <- z$Weight_g^(1/3)

Estimate phenotypic selection differentials

Overview of estimating selection differentials

Parameter | Formula -------------------------------------------------|---------------------------------------- linear selection differential ($S_i$) | $cov(w, \bar{z}$) quadratic selection differential ($C_{i,i}$) | $cov(w, (z - \bar{z})^2$) correlational selection differential ($C_{i,j}$) | $cov(w, (z_i - \bar{z_i})(z_j - \bar{z_j}$))

See Brodie et al. 1995 for a more detailed description.

Calculate the selection differentials

The phenotypic traits (z) will automatically be centered to a mean of zero and scaled to unit variance. The differentials() function outputs the linear and nonlinear selection differentials. The terms ending in ".Sq" are the quadratic selection differentials, and the interactions are the correlational selection differentials.

# Look up the documentation for differentials
?differentials

# Estimate the differentials using only the covariance method
bm_dCov <- differentials(w, z, 1)

# Look up the documentation for differentials_bootstats for estimating the standard errors and confidence intervals, using the covariance method for estimating the selection differentials
?differentials_bootstats

set.seed(123)
bm_booted_dCov <- differentials_bootstats(w, z, method = 1)

# The standard errors can be easily extracted from $se
bm_booted_dCov$se

Are all methods to estimate selection differentials equivalent?

The short answer is: no. To illustrate this, we can use the psa::differentials function to compare how different methods to quantify selection differentials impact the results.

# Compare the differentials using different methods
bm_all <- differentials(w, z, "all")

# Compare the standard errors of the differentials that were estimated with different methods
set.seed(123)
bm_booted_dBeforeAfter <- differentials_bootstats(w,z,method = 2)
bm_booted_dMatrix <- differentials_bootstats(w,z,method = 3)
bm_booted_dReg <- differentials_bootstats(w,z,method = 4)

bm_se_all <- data.frame(cbind(dCov = bm_booted_dCov$se, dBeforeAfter = bm_booted_dBeforeAfter$se, dMatrix = bm_booted_dMatrix$se, dReg = bm_booted_dReg$se), stringsAsFactors = FALSE)


# melt the results based on the method used to compute the estimates
library(reshape2)
melt_bm <- melt(t(bm_all), id = names(bm_all))
melt_bm_se <- melt(t(bm_se_all), id = names(bm_se_all))

limits <- aes(ymax = melt_bm$value + melt_bm_se$value, ymin = melt_bm$value - melt_bm_se$value)

# plot the data
# see docs.ggplot2.org/current/geom_point.html
ggplot(melt_bm, aes(Var1, value, fill = Var1)) + 
  geom_point(aes(colour = factor(Var1))) + 
  geom_errorbar(limits, width = 0.2) +
  # once I have the se's try:   geom_point(aes(colour = factor(Method), size = se)) + 
  facet_wrap(~Var2) + 
  geom_abline(intercept = 0, slope = 0, colour = "gray") +
  labs(title = "Selection differentials for male Bumpus data") + 
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
  theme(legend.position = "none",
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.background = element_blank(),
    axis.line = element_line(colour = "black")
    )

First, scaling the data to a mean of zero and unit variance makes a big difference. When reporting selection coefficients, it is recommended to report the standardized selection coefficients so that they are more comparable across studies.

Robertson (1966) and Price (1970, 1972) originally proposed that directional selection on individual phenotypic traits could be estimated by either the covariance or regression of relative fitness on the trait (Lande and Arnold 1983).

Next, let's focus on dCov vs dReg. The dCov row provides the same results as the psa::dCov function. dRegScale represents a linear regression with the minimum number of traits for the analysis. Linear selection differentals using dRegScale would be estimated by $w = \alpha + z_i$, quadratic selection differents by $w = \alpha + z_i + 0.5z_i^2$, and correlational selection differentials by $w = \alpha + z_i + z_j + z_iz_j$, whereby the z data would be normalized to a mean of zero and unit variance.
Notice that the differentials produced by dCovScale and dRegScale are the same for only linear selection. The regression-based method does not produce similar results to the covariance method once nonlinear selection differentials are estimated. The estimates from using these two methods on the exact same data set can be orders of magnitude different. For instance, "TotalLength_mm x FemurL_mm" has 0.033 vs -0.009 for dCovScale and dRegScale, respectively, which alters the interpretation from an increase in covariance to a decrease in covariance. Moreover, "Weight_g x TibioTarL_mm" has 0.058 vs. 0.132, which is the difference between weak selection and strong selection. Yet, on other ocassions, dRegScale and dCovScale still produce comparable results for nonlinear terms: "Weight_g x SkullW_mm" is estimated to be 0.007 for both methods.

Similar results can be found for other datasets. Here is the humanNeonatal dataset from the gsg package, which has a much larger sample size than the BumpusMales dataset.

require(gsg)
data(humanNeonatal)
str(humanNeonatal)

# Creating a relative fitness (w) variable
humanNeonatal$w <- humanNeonatal$nns/mean(humanNeonatal$nns)

# Comparing the selection differentials
differentials(humanNeonatal$w, humanNeonatal[,2:3], "all")

Given that the original formulae to estimate linear and nonlinear selection differentials (e.g., Lande and Arnold 1983, Phillips and Arnold 1989) were based on the covariance method, it would likely be best to stick to that method. With data normalized to a mean of zero and unit variance: simple linear regressions can produce similar results as the covariance method for linear selection differentials, but are inconsistent for nonlinear selection differentials. With raw data: taking the phenotypic differences before and after selection can produce similar results as the covariance method for both linear and nonlinear selection differentials. For the humanNeonatal dataset, notice that the estimates were relatively consistent amongst the first three methods for both linear and nonlienar selection differentials; the regression method produced the most incongruent results.

References
Brodie ED III, Moore AJ, Janzen FJ. 1995. Visualizing and quantifying natural selection. TREE 10(8): 313-318.
Lande SJ, Arnold R. 1983. The measurement o fselection on correlated characters. Evolution 37(6): 1210-1226.
Phillips PC, Arnold SJ. 1989. Visualizing multivariate selection. Evolution 43(6): 1209-1222. Price GR. 1970. Selection and covariance. Nature 227: 520-521.
Price GR. 1972. Extension of covariance selection mathematics. Annals of Human Genetics 35: 485-490.
Robertson A. 1966. A mathematical model of the culling process in dairy cattle. Animal Production 8: 93-108.