Overview

Perennial fruit species, such as apple trees, are known to have irregular fruit bearing, notably biennial. Several biennial bearing indices (BBI) hence have been proposed, notably to handle the fact that fruit production not only is irregular but also increases over the years, i.e. has a positive trend. See Durand et al (2013) for details.

This document requires external packages:

suppressPackageStartupMessages(library(parallel))
nb.cores <- ifelse(detectCores() > 20, 20, detectCores() - 1)
suppressPackageStartupMessages(library(MASS))
suppressPackageStartupMessages(library(lme4))
suppressPackageStartupMessages(library(nlme))
suppressPackageStartupMessages(library(rutilstimflutre))

This R chunk is used to assess how much time it takes to execute the R code in this document until the end:

t0 <- proc.time()

Modelling

Indices:

Fruit production (yield): prod = intercept + slope x t + error

so that $\beta + \alpha \; t$ represents the mean trend, and $\beta_g + \alpha_g \; t$ the trend deviation for genotype $g$. Deviations for replicates (the $\zeta$'s and $\xi$'s) are ignored in this document.

Likelihood:

[ y_{g,r,t} = (\beta + \beta_g) + (\alpha + \alpha_g) \; t + \epsilon_{g,r,t} ]

where $\epsilon_{g,r,t} = (\gamma + \gamma_g) \; \epsilon_{g,r,t-1} + u_{g,r,t}$ with $u_{g,r,t} \sim \mathcal{N}(0, \sigma^2)$.

As the temporal dependency between errors is modeled as a first-order autoregressive process, AR(1), we can write the following:

For any genotype $g$, let us introduce $\phi_{1,g} = \gamma + \gamma_g$ so that, $\forall g, \; \epsilon_{g,r,t} = \phi_{1,g} \, \epsilon_{g,r,t-1} + u_{g,r,t}$.

By construction, the mean of this AR(1) process is zero: $E(\epsilon_{g,r,t}) = 0$.

Moreover, its variance is: $v_g = V(\epsilon_{g,r,t-1}) = V(\epsilon_{g,r,t}) = \phi_{1,g}^2 \, V(\epsilon_{g,r,t-1}) + \sigma^2 \Rightarrow v_g = \frac{\sigma^2}{1 - \phi_{1,g}^2}$.

To interpret $\phi_{1,g}$, it helps to write down the covariance and correlation between successive errors:

$Cov(\epsilon_{g,r,t-1}, \, \epsilon_{g,r,t}) = E(\epsilon_{g,r,t-1} \epsilon_{g,r,t}) = \phi_{1,g} \, V(\epsilon_{g,r,t}) \Rightarrow cor(\epsilon_{g,r,t-1}, \, \epsilon_{g,r,t}) = \phi_{1,g} = \rho_g$

That is why $\gamma + \gamma_g$ is close to 0 for regular bearing genotypes and to -1 for biennial bearing genotypes.

With $\forall g, \; \epsilon_{g,r,1} \sim \mathcal{N}(0, \frac{\sigma^2}{1 - \rho_g^2})$, we can write the distribution of all errors for tree $r$ of genotype $g$ as a vector distributed according to a multivariate Normal:

$\forall g,r, \; \boldsymbol{\epsilon_{g,r}} \sim \mathcal{N}_T(\boldsymbol{0}, \Sigma_g)$

where the covariance matrix can be written as a scaled correlation matrix:

[ \begin{equation} \Sigma_g = \frac{\sigma^2}{1 - \rho_g^2} \begin{pmatrix} 1 & \rho_g & \rho_g^2 & \cdots \ \rho_g & 1 & \rho_g & \cdots \ \rho_g^2 & \rho_g & 1 & \cdots \ \vdots & \vdots & \vdots & \ddots \ \end{pmatrix} \end{equation} ]

Genotype effects:

Simulation

Make the skeleton of a data frame:

G <- 200 # number of genotypes
genos <- paste0("geno", sprintf(fmt="%03i", 1:G))
R <- 5   # number of trees per genotype, i.e. replicates
T <- 10  # number of years
years <- 2001:(2001 + T - 1)
dat.annual <- data.frame(geno=rep(genos, each=R),
                         tree=rep(1:R, G))
dat <- dat.annual
for(t in 2:T)
  dat <- rbind(dat, dat.annual)
dat$year <- rep(years, each=G * R)
dat$year <- as.factor(dat$year)
dat$tree <- as.factor(dat$tree)
dat <- dat[order(dat$geno, dat$tree, dat$year),]
rownames(dat) <- NULL
str(dat)
dat[1:(T+3),]

Set input parameter values:

beta <- 100
sd.beta.g <- 3
alpha <- 1
sd.alpha.g <- 1
gamma <- -0.5
sigma <- 2

Make design matrices:

Z.g <- model.matrix(~ -1 + geno, data=dat)
dat$year.num <- as.numeric(levels(dat$year))[dat$year]
dat$t <- dat$year.num - min(dat$year.num)
dat$geno.tree <- as.factor(paste0(dat$geno, "_", dat$tree))
Z.gr <- model.matrix(~ -1 + geno.tree, data=dat)

Simulate production data, ignoring trend deviations for replicate $r$ of genotype $g$, and noting that $\gamma + \gamma_g = 0$ means regular bearing and -1 means biennial bearing:

set.seed(1859)
beta.g <- rnorm(n=G, mean=0, sd=sd.beta.g)
alpha.g <- rnorm(n=G, mean=0, sd=sd.alpha.g)
## gamma.g <- rep(0, G) # all genotypes have the same bearing pattern
gamma.g <- c(rep(-0.4, floor(G/3)),           # -> gamma+gamma.g = -0.9
             rep(0, floor(G/3)),              # -> gamma+gamma.g = gamma
             rep(-gamma, G - 2 * floor(G/3))) # -> gamma+gamma.g = 0
## gamma.g <- runif(n=G, min=-1-gamma, max=-gamma)
summary(gamma + gamma.g)
table(gamma + gamma.g)
names(gamma.g) <- genos
Sigma.g <- lapply(1:G, function(g){
  covMatAR1(n=T, rho=gamma + gamma.g[g], sigma2=sigma^2)
})
epsilon.g <- lapply(1:G, function(g){
  mvrnorm(n=R, mu=rep(0, T), Sigma=Sigma.g[[g]])
})
y <- beta + Z.g %*% beta.g +
  dat$t * (alpha + Z.g %*% alpha.g)
dat$error <- NA
for(g in 1:G){
  geno <- levels(dat$geno)[g]
  for(r in 1:R){
    tree <- levels(dat$tree)[r]
    idx.gr <- grep(paste0("^", geno, "_", tree, "$"), dat$geno.tree)
    stopifnot(length(idx.gr) == T)
    dat$error[idx.gr] <- epsilon.g[[g]][r,]
    y[idx.gr] <- y[idx.gr] + epsilon.g[[g]][r,]
  }
}
dat$prod <- y

Visual exploration

Plot the distribution of the data:

boxplot(prod ~ year, dat, las=1,
        xlab="years", ylab="production", main="Phenotypic data")

Useful function to plot results for a subset of genotypes:

plotSubsetGenos <- function(dat, response, some.genos, tree="1",
                            add.lm=TRUE){
  tmp <- dat[dat$geno %in% some.genos &
             dat$tree == tree,
             c("geno", "tree", "year", response)]
  tmp <- droplevels(tmp)
  tmp$year <- as.numeric(levels(tmp$year))[tmp$year]

  ylim <- range(tmp[[response]])
  if(grepl("error|epsilon", response))
    ylim <- c(-max(abs(tmp[[response]])),
              max(abs(tmp[[response]])))
  plot(x=tmp$year[tmp$geno == some.genos[1]],
       y=tmp[[response]][tmp$geno == some.genos[1]],
       ylim=ylim,
       type="b", pch=1, col=1,
       main=paste0("'", response, "' for ",
                   length(some.genos), " genotypes"),
       xlab="years", las=1,
       ylab=response)
  form <- as.formula(paste0(response, " ~ year"))
  if(add.lm){
    fit <- lm(form, data=droplevels(tmp[tmp$geno == some.genos[1],]))
    abline(fit, col=1, lty=2)
  }

  for(i in 2:length(some.genos)){
    points(x=tmp$year[tmp$geno == some.genos[i]],
           y=tmp[[response]][tmp$geno == some.genos[i]],
           type="b", pch=1, col=i)
    if(add.lm){
      fit <- lm(form, data=droplevels(tmp[tmp$geno == some.genos[i],]))
      abline(fit, col=i, lty=2)
    }
  }

  if(add.lm){
    fit <- lm(form, data=tmp)
    abline(fit, col="darkgrey", lty=2, lwd=2)
  }

  legend("bottomright", legend=some.genos, col=1:length(some.genos),
         lty=1, pch=1, bty="n")
  invisible(tmp)
}

Plot production for a subset of genotypes:

## (some.genos <- sample(genos, 3))
(some.genos <- genos[c(floor(0.5*G/3),
                       floor(G/3+0.5*G/3),
                       floor(2*G/3+0.5*G/3))])
gamma + gamma.g[some.genos]
plotSubsetGenos(dat, "prod", some.genos)
abline(h=beta, col="darkgrey")

Plot errors for the same subset of genotypes as above:

plotSubsetGenos(dat, "error", some.genos)
abline(h=0, col="darkgrey")

Plot the absolute difference between consecutive yields as a function of yield (see figure S2 from Durand et al, 2013):

yield <- c()
abs.diff <- c()
for(g in 1:nlevels(dat)){
  geno <- levels(dat$geno)[g]
  trees.g <- unique(dat[dat$geno == geno, "tree"])
  R.g <- length(trees.g)
  for(r in 1:R.g){
    tree <- trees.g[r]
    y.g.r <- dat[dat$geno == geno &
                 dat$tree == tree,
                 "prod"]
    abs.diff.g.r <- diff(y.g.r)
    yield <- c(yield, y.g.r[-1])
    abs.diff <- c(abs.diff, abs.diff.g.r)
  }
}
plot(yield, abs.diff, las=1,
     xlab="Yt", ylab="|Yt - Yt-1|",
     main=paste0("Correlation: ", round(cor(yield, abs.diff), 2)))

Inference

Genotype as fixed effects

fit.lm <- lm(formula=prod ~ 1 + geno + t + geno:t,
             data=dat)
AIC(fit.lm)
summary(fit.lm)
cbind(truth=c(beta, alpha),
      estim=c(coef(fit.lm)["(Intercept)"], coef(fit.lm)["t"]))
dat$epsilonA <- residuals(fit.lm)
betterSummary(dat$epsilonA)

Plot residuals for the same subset of genotypes as above:

plotSubsetGenos(dat, "epsilonA", some.genos)
plot(dat$epsilonA, dat$error, las=1, asp=1)
abline(a=0, b=1, v=0, h=0, lty=2)

Fit an AR(1) to the residuals:

fitA.ar1.g <- genoAr1Coef(dat=dat, coln.epsilon="epsilonA")
cor(fitA.ar1.g, gamma.g)
plot(fitA.ar1.g, gamma.g, las=1, asp=1)
abline(a=0, b=1, h=0, v=0, lty=2)
plot(fitA.ar1.g, gamma + gamma.g, las=1, asp=1)
abline(a=0, b=1, h=0, v=0, lty=2)

Genotype as random effects

fit.lmer <- lmer(formula=prod ~ 1 + t + (1|geno) + (0 + t|geno),
                 data=dat,
                 REML=TRUE)
AIC(fit.lmer)
summary(fit.lmer)
vc <- as.data.frame(VarCorr(fit.lmer))
cbind(c(beta, alpha, sd.beta.g, sd.alpha.g, sigma),
      c(fixef(fit.lmer),
        vc[vc$grp == "geno" & vc$var1 == "(Intercept)", "sdcor"],
        vc[vc$grp == "geno.1" & vc$var1 == "t", "sdcor"],
        vc[vc$grp == "Residual", "sdcor"]))
dat$epsilonB <- residuals(fit.lmer)
betterSummary(dat$epsilonB)

Plot residuals for the same subset of genotypes as above:

plotSubsetGenos(dat, "epsilonB", some.genos)
plot(dat$epsilonB, dat$error, las=1, asp=1)
abline(a=0, b=1, v=0, h=0, lty=2)

Fit an AR(1) to the residuals:

fitB.ar1.g <- genoAr1Coef(dat=dat, coln.epsilon="epsilonB")
cor(fitB.ar1.g, gamma.g)
plot(fitB.ar1.g, gamma.g, las=1, asp=1, main="Truth vs estimates")
abline(a=0, b=1, h=0, v=0, lty=2)
plot(fitB.ar1.g, gamma + gamma.g, las=1, asp=1, main="Truth vs estimates")
abline(a=0, b=1, h=0, v=0, lty=2)
plot(fitA.ar1.g, fitB.ar1.g, las=1, main="Estimates A vs B")
abline(a=0, b=1, h=0, v=0, lty=2)

BBI

Compute indices:

BBIs <- list()
for(type in c("classic", "norm", "res_norm")){
  BBIs[[type]] <- BBI(dat, type=type, coln.epsilon="epsilonB")
  hist(BBIs[[type]], col="grey", border="white", las=1,
       main=paste0("BBI (", type, ")"),
       xlab=paste0("BBI (", type, ")"))
}
plot(BBIs$classic, BBIs$norm, las=1, main="BBI: classic vs norm")
abline(a=0, b=1, lty=2)
plot(BBIs$classic, BBIs$res_norm, las=1, main="BBI: classic vs res_norm")
abline(a=0, b=1, lty=2)
plot(BBIs$norm, BBIs$res_norm, las=1, main="BBI: norm vs res_norm")
abline(a=0, b=1, lty=2)

Comparison BBI vs AR1

cor(fitB.ar1.g, BBIs$classic[names(fitB.ar1.g)], method="pearson")
cor(fitB.ar1.g, BBIs$classic[names(fitB.ar1.g)], method="spearman")
plot(fitB.ar1.g, BBIs$classic[names(fitB.ar1.g)], las=1,
     main="BBI classic vs AR1 (random)")
plot(fitB.ar1.g, BBIs$norm[names(fitB.ar1.g)], las=1,
     main="BBI norm vs AR1 (random)")
plot(fitB.ar1.g, BBIs$res_norm[names(fitB.ar1.g)], las=1,
     main="BBI res_norm vs AR1 (random)")

Appendix

format(Sys.time(), '%d/%m/%Y %H:%M:%S')
t1 <- proc.time(); t1 - t0
print(sessionInfo(), locale=FALSE)


timflutre/rutilstimflutre documentation built on Feb. 7, 2024, 8:17 a.m.