knitr::opts_chunk$set(comment = "#", collapse = TRUE)

Getting started

In this section, we will use the data in data_ge. For more information, please, see ?data_ge. Other data sets can be used provided that the following columns are in the dataset: environment, genotype, block/replicate and response variable(s). See the section Rendering engine to know how HTML tables were generated.

library(metan)
library(DT) # Used to make the tables
# Function to make HTML tables
print_table <- function(table, rownames = FALSE, digits = 3, ...){
  df <- datatable(table, rownames = rownames, extensions = 'Buttons',
                  options = list(scrollX = TRUE, 
                                 dom = '<<t>Bp>',
                                 buttons = c('copy', 'excel', 'pdf', 'print')), ...)
  num_cols <- c(as.numeric(which(sapply(table, class) == "numeric")))
  if(length(num_cols) > 0){
    formatSignif(df, columns = num_cols, digits = digits)
  } else{
    df
  }
}

The first step is to inspect the data with the function inspect().

library(metan)
inspect <- 
inspect(data_ge,
        verbose = FALSE,
        plot = TRUE)

print_table(inspect)

Analysis of single experiments using mixed-models

The function gamem() may be used to analyze single experiments (one-way experiments) using a mixed-effect model according to the following model:

$$ y_{ij}= \mu + \alpha_i + \tau_j + \varepsilon_{ij} $$ where $y_{ij}$ is the value observed for the ith genotype in the jth replicate (i = 1, 2, ... g; j = 1, 2, .., r); being g and r the number of genotypes and replicates, respectively; $\alpha_i$ is the random effect of the ith genotype; $\tau_j$ is the fixed effect of the jth replicate; and $\varepsilon_{ij}$ is the random error associated to $y_{ij}$. In this example, we will use the example data data_g from metan package.

gen_mod <- gamem(data_g, GEN, REP,
                 resp = c(ED, CL, CD, KW, TKW, NKR))

The easiest way of obtaining the results of the model above is by using the function get_model_data(). Let's do it.

get_model_data(gen_mod, "details") %>% print_table()
get_model_data(gen_mod, "lrt") %>% print_table()
get_model_data(gen_mod, "genpar") %>% print_table()
get_model_data(gen_mod, "blupg") %>% print_table()

In the above example, the experimental design was a complete randomized block. It is also possible to analyze an experiment conducted in an alpha-lattice design with the function gamem(). In this case, the following model is fitted:

$$ y_{ijk}= \mu + \alpha_i + \gamma_j + (\gamma \tau){jk} + \varepsilon{ijk} $$

where $y_{ijk}$ is the observed value of the ith genotype in the kth block of the jth replicate (i = 1, 2, ... g; j = 1, 2, .., r; k = 1, 2, .., b); respectively; $\alpha_i$ is the random effect of the ith genotype; $\gamma_j$ is the fixed effect of the jth complete replicate; $(\gamma \tau){jk}$ is the random effect of the kth incomplete block nested within the j replicate; and $\varepsilon{ijk}$ is the random error associated to $y_{ijk}$. In this example, we will use the example data data_alpha from metan package.

gen_alpha <- gamem(data_alpha, GEN, REP, YIELD, block = BLOCK)
get_model_data(gen_alpha, "lrt") %>% print_table()
get_model_data(gen_alpha, "details") %>% print_table()
get_model_data(gen_alpha, "genpar") %>% print_table()

The BLUP model for MET trials

The simplest and well-known linear model with interaction effect used to analyze data from multi-environment trials is $$ {y_{ijk}} = {\rm{ }}\mu {\rm{ }} + \mathop \alpha \nolimits_i + \mathop \tau \nolimits_j + \mathop {(\alpha \tau )}\nolimits_{ij} + \mathop \gamma \nolimits_{jk} + {\rm{ }}\mathop \varepsilon \nolimits_{ijk} $$

where ${y_{ijk}}$ is the response variable (e.g., grain yield) observed in the kth block of the ith genotype in the jth environment (i = 1, 2, ..., g; j = 1, 2, ..., e; k = 1, 2, ..., b); $\mu$ is the grand mean; $\mathop \alpha \nolimits_i$ is the effect of the ith genotype; $\mathop \tau \nolimits_j$ is the effect of the jth environment; $\mathop {(\alpha \tau )}\nolimits_{ij}$ is the interaction effect of the ith genotype with the jth environment; $\mathop \gamma \nolimits_{jk}$ is the effect of the kth block within the jth environment; and $\mathop \varepsilon \nolimits_{ijk}$ is the random error. In a mixed-effect model assuming ${\alpha i}$ and $\mathop {(\alpha \tau )}\nolimits{ij}$ to be random effects, the above model can be rewritten as follows

$$ {\boldsymbol{y = X\beta + Zu + \varepsilon }} $$

where y is an $n[ = \sum\nolimits_{j = 1}^e {(gb)]} \times 1$ vector of response variable ${\bf{y}} = {\rm{ }}{\left[ {{y_{111}},{\rm{ }}{y_{112}},{\rm{ }} \ldots ,{\rm{ }}{y_{geb}}} \right]^\prime }$; ${\bf{\beta }}$ is an $(eb) \times 1$ vector of unknown fixed effects ${\boldsymbol{\beta }} = [\mathop \gamma \nolimits_{11} ,\mathop \gamma \nolimits_{12} ,...,\mathop \gamma \nolimits_{eb} ]'$; u is an $m[ = g + ge] \times 1$ vector of random effects ${\boldsymbol{u}} = {\rm{ }}{\left[ {{\alpha 1},{\alpha _2},...,{\alpha _g},\mathop {(\alpha \tau )}\nolimits{11} ,\mathop {(\alpha \tau )}\nolimits_{12} ,...,\mathop {(\alpha \tau )}\nolimits_{ge} } \right]^\prime }$; X is an $n \times (eb)$ design matrix relating y to ${\bf{\beta }}$; Z is an $n \times m$ design matrix relating y to u ; ${\boldsymbol{\varepsilon }}$ is an $n \times 1$ vector of random errors ${\boldsymbol{\varepsilon }} = {\rm{ }}{\left[ {{y_{111}},{\rm{ }}{y_{112}},{\rm{ }} \ldots ,{\rm{ }}{y_{geb}}} \right]^\prime }$;

The vectors ${\boldsymbol{\beta }}$ and u are estimated using the well-known mixed model equation [@Henderson1975]. $$ \left[ {\begin{array}{{20}{c}}{{\boldsymbol{\hat \beta }}}\{{\bf{\hat u}}}\end{array}} \right]{\bf{ = }}{\left[ {\begin{array}{{20}{c}}{{\bf{X'}}{{\bf{R }}^{ - {\bf{1}}}}{\bf{X}}}&{{\bf{X'}}{{\bf{R }}^{ - {\bf{1}}}}{\bf{Z}}}\{{\bf{Z'}}{{\bf{R }}^{ - {\bf{1}}}}{\bf{X}}}&{{\bf{Z'}}{{\bf{R }}^{ - {\bf{1}}}}{\bf{Z + }}{{\bf{G}}^{ - {\bf{1}}}}}\end{array}} \right]^ - }\left[ {\begin{array}{*{20}{c}}{{\bf{X'}}{{\bf{R }}^{ - {\bf{1}}}}{\bf{y}}}\{{\bf{Z'}}{{\bf{R }}^{ - {\bf{1}}}}{\bf{y}}}\end{array}} \right] $$ where G and R are the variance-covariance matrices for random-effect vector u and residual vector ${\bf{\varepsilon }}$, respectively.

The function gamem_met() is used to fit the linear mixed-effect model. The first argument is the data, in our example data_ge. The arguments (env, gen, and rep) are the name of the columns that contains the levels for environments, genotypes, and replications, respectively.The argument (resp) is the response variable to be analyzed. The function allow a single variable (in this case GY) or a vector of response variables. Here, we will use everything() to analyse all numeric variables in the data. By default, genotype and genotype-vs-environment interaction are assumed to be random effects. Other effects may be considered using the random argument. The last argument (verbose) control if the code is run silently or not.

mixed_mod <- 
  gamem_met(data_ge,
            env = ENV,
            gen = GEN,
            rep = REP,
            resp = everything(),
            random = "gen", #Default
            verbose = TRUE) #Default

Diagnostic plot for residuals

The S3 generic function plot() is used to generate diagnostic plots of residuals of the model.

plot(mixed_mod)

The normality of the random effects of genotype and interaction effects may be also obtained by using type = "re".

plot(mixed_mod, type = "re")

Printing the model outputs

Likelihood Ratio Tests

The output LRT contains the Likelihood Ratio Tests for genotype and genotype-vs-environment random effects. We can get these values with get_model_data()

data <- get_model_data(mixed_mod, "lrt")
print_table(data)

Variance components and genetic parameters

In the output ESTIMATES, beyond the variance components for the declared random effects, some important parameters are also shown. Heribatility is the broad-sense heritability, $\mathop h\nolimits_g^2$, estimated by $$ \mathop h\nolimits_g^2 = \frac{\mathop {\hat\sigma} \nolimits_g^2} {\mathop {\hat\sigma} \nolimits_g^2 + \mathop {\hat\sigma} \nolimits_i^2 + \mathop {\hat\sigma} \nolimits_e^2 } $$

where $\mathop {\hat\sigma} \nolimits_g^2$ is the genotypic variance; $\mathop {\hat\sigma} \nolimits_i^2$ is the genotype-by-environment interaction variance; and $\mathop {\hat\sigma} \nolimits_e^2$ is the residual variance.

GEIr2 is the coefficient of determination of the interaction effects, $\mathop r\nolimits_i^2$, estimated by

$$ \mathop r\nolimits_i^2 = \frac{\mathop {\hat\sigma} \nolimits_i^2} {\mathop {\hat\sigma} \nolimits_g^2 + \mathop {\hat\sigma} \nolimits_i^2 + \mathop {\hat\sigma} \nolimits_e^2 } $$ Heribatility of means is the heribability on the mean basis, $\mathop h\nolimits_{gm}^2$, estimated by

$$ \mathop h\nolimits_{gm}^2 = \frac{\mathop {\hat\sigma} \nolimits_g^2}{[\mathop {\hat\sigma} \nolimits_g^2 + \mathop {\hat\sigma} \nolimits_i^2 /e + \mathop {\hat\sigma} \nolimits_e^2 /\left( {eb} \right)]} $$

where e and b are the number of environments and blocks, respectively; Accuracy is the accuracy of selection, Ac, estimated by $$ Ac = \sqrt{\mathop h\nolimits_{gm}^2} $$

rge is the genotype-environment correlation, $\mathop r\nolimits_{ge}$, estimated by

$$ \mathop r\nolimits_{ge} = \frac{\mathop {\hat\sigma} \nolimits_g^2}{\mathop {\hat\sigma} \nolimits_g^2 + \mathop {\hat\sigma} \nolimits_i^2} $$

CVg and CVr are the the genotypic coefficient of variation and the residual coefficient of variation estimated, respectively, by $$ CVg = \left( {\sqrt {\mathop {\hat \sigma }\nolimits_g^2 } /\mu } \right) \times 100 $$ and $$ CVr = \left( {\sqrt {\mathop {\hat \sigma }\nolimits_e^2 } /\mu } \right) \times 100 $$ where $\mu$ is the grand mean.

CV ratio is the ratio between genotypic and residual coefficient of variation.

data <- get_model_data(mixed_mod)
print_table(data)

BLUP for genotypes

print_table(mixed_mod$GY$BLUPgen)

The function get_model_data() may be used to easily get the data from a model fitted with the function gamem_met(), especially when more than one variables are used. The following code return the predicted mean of each genotype for five variables of the data data_ge2.

get_model_data(mixed_mod, what = "blupg")

Plotting the BLUP for genotypes

library(ggplot2)
a <- plot_blup(mixed_mod)
b <- plot_blup(mixed_mod, 
               col.shape  =  c("gray20", "gray80"),
               plot_theme = theme_metan(grid = "y")) +
  coord_flip()
arrange_ggplot(a, b, tag_levels = "a")

This output shows the predicted means for genotypes. BLUPg is the genotypic effect $(\hat{g}_{i})$, which considering balanced data and genotype as random effect is estimated by

$$ \hat{g}{i} = h_g^2(\bar{y}{i.}-\bar{y}_{..}) $$

where $h_g^2$ is the shrinkage effect for genotype. Predicted is the predicted mean estimated by $$ \hat{g}_{i}+\mu $$

where $\mu$ is the grand mean. LL and UL are the lower and upper limits, respectively, estimated by $$ (\hat{g}_{i}+\mu)\pm{CI} $$ with $$ CI = t\times\sqrt{((1-Ac)\times{\mathop \sigma \nolimits_g^2)}} $$

where $t$ is the Student's t value for a two-tailed t test at a given probability error; $Ac$ is the accuracy of selection and $\mathop \sigma \nolimits_g^2$ is the genotypic variance.

BLUP for genotypes X environment combination

print_table(mixed_mod$GY$BLUPint)

This output shows the predicted means for each genotype and environment combination. BLUPg is the genotypic effect described above. BLUPge is the genotypic effect of the ith genotype in the jth environment $(\hat{g}{ij})$, which considering balanced data and genotype as random effect is estimated by $$\hat{g}{ij} = h_g^2(\bar{y}{i.}-\bar{y}{..})+h_{ge}^2(y_{ij}-\bar{y}{i.}-\bar{y}{.j}+\bar{y}{..})$$ where $h{ge}^2$ is the shrinkage effect for the genotype-by-environment interaction; BLUPg+ge is $BLUP_g+BLUP_{ge}$; Predicted is the predicted mean ($\hat{y}{ij}$) estimated by $$ \hat{y}{ij} = \bar{y}{.j}+BLUP{g+ge} $$

Some useful information

The following pieces of information are provided in Details output. Nenv, the number of environments in the analysis; Ngen the number of genotypes in the analysis; mresp The value attributed to the highest value of the response variable after rescaling it; wresp The weight of the response variable for estimating the WAASBY index. Mean the grand mean; SE the standard error of the mean; SD the standard deviation. CV the coefficient of variation of the phenotypic means, estimating WAASB, Min the minimum value observed (returning the genotype and environment), Max the maximum value observed (returning the genotype and environment); MinENV the environment with the lower mean, MaxENV the environment with the larger mean observed, MinGEN the genotype with the lower mean, MaxGEN the genotype with the larger.

data <- get_model_data(mixed_mod, "details")
print_table(data)

The WAASB object

The function waasb() function computes the Weighted Average of the Absolute Scores considering all possible IPCA from the Singular Value Decomposition of the BLUPs for genotype-vs-environment interaction effects obtained by an Linear Mixed-effect Model [@Olivoto2019], as follows:

$$ WAASB_i = \sum_{k = 1}^{p} |IPCA_{ik} \times EP_k|/ \sum_{k = 1}^{p}EP_k $$

where $WAASB_i$ is the weighted average of absolute scores of the ith genotype; $IPCA_{ik}$ is the scores of the ith genotype in the kth IPCA; and $EP_k$ is the explained variance of the kth PCA for $k = 1,2,..,p$, $p = min(g-1; e-1)$.

waasb_model <- 
  waasb(data_ge,
        env = ENV,
        gen = GEN,
        rep = REP,
        resp = everything(),
        random = "gen", #Default
        verbose = TRUE) #Default

data <- waasb_model$GY$model
print_table(data)

The output generated by the waasb() function is very similar to those generated by the waas() function. The main difference here, is that the singular value decomposition is based on the BLUP for GEI effects matrix.

Eigenvalues of the BLUP_GEI matrix

data <- waasb_model$GY$PCA
print_table(data)
plot_eigen(waasb_model, size.lab = 14, size.tex.lab = 14)

The above output shows the eigenvalues and the proportion of variance explained by each principal component axis of the BLUP interaction effects matrix.

Phenotypic means

data <- waasb_model$GY$MeansGxE
print_table(data)

In this output, Y is the phenotypic mean for each genotype and environment combination ($y_{ij}$), estimated by $y_{ij} = \sum_k{y_{ij}}/B$ with $k = 1,2,...B$.

Biplots

Provided that an object of class waasb is available in the global environment, the graphics may be obtained using the function plot_scores(). To do that, we will revisit the previusly fitted model WAASB . Please, refer to ?plot_scores for more details. Four types of graphics can be generated: 1 = $PC1 \times PC2$; 2 = $GY \times PC1$; 3 = $GY \times WAASB$; and 4 = a graphic with nominal yield as a function of the environment PCA1 scores.

biplot type 1: GY x PC1

c <- plot_scores(waasb_model, type = 1)
d <- plot_scores(waasb_model,
                 type = 1,
                 col.gen = "black",
                 col.env = "red",
                 col.segm.env = "red",
                 axis.expand = 1.5)
arrange_ggplot(c, d, tag_levels = list(c("c", "d")))

biplot type 2: PC1 x PC2

e <- plot_scores(waasb_model, type = 2)
f <- plot_scores(waasb_model,
                 type = 2,
                 polygon = TRUE,
                 col.segm.env = "transparent",
                 plot_theme = theme_metan_minimal())
arrange_ggplot(e, f, tag_levels = list(c("e", "f")))

biplot type 3: GY x WAASB

The quadrants proposed by @Olivoto2019 in the following biplot represent four classifications regarding the joint interpretation of mean performance and stability. The genotypes or environments included in quadrant I can be considered unstable genotypes or environments with high discrimination ability, and with productivity below the grand mean. In quadrant II are included unstable genotypes, although with productivity above the grand mean. The environments included in this quadrant deserve special attention since, in addition to providing high magnitudes of the response variable, they present a good discrimination ability. Genotypes within quadrant III have low productivity, but can be considered stable due to the lower values of WAASB. The lower this value, the more stable the genotype can be considered. The environments included in this quadrant can be considered as poorly productive and with low discrimination ability. The genotypes within the quadrant IV are highly productive and broadly adapted due to the high magnitude of the response variable and high stability performance (lower values of WAASB).

g <- plot_scores(waasb_model, type = 3)
h <- plot_scores(waasb_model, type = 3,
                 x.lab = "My customized x label",
                 size.shape.gen = 3,
                 size.tex.gen = 2,
                 x.lim = c(1.2, 4.7),
                 x.breaks = seq(1.5, 4.5, by = 0.5),
                 plot_theme = theme_metan(color.background = "white"))
arrange_ggplot(g, h, tag_levels = list(c("g", "h")))

To obtain the WAASB index for a set of variables, the function get_model_data() is used, as shown bellow.

waasb(data_ge2, ENV, GEN, REP,
     resp = c(PH, ED, TKW, NKR)) %>%
 get_model_data(what = "WAASB") %>%
 print_table()

biplot type 4 : nominal yield and environment IPCA1

i <- plot_scores(waasb_model, type = 4)
j <- plot_scores(waasb_model,
                 type = 4,
                 size.tex.gen = 1.5,
                 color = FALSE,
                 col.alpha.gen = 0,
                 col.alpha.env = 0,
                 plot_theme = theme_metan(color.background = "white"))
arrange_ggplot(i, j, tag_levels = list(c("i", "j")))

Simultaneous selection for mean performance and stability

The waasby index is used for genotype ranking considering both the stability (waasb) and mean performance (y) based on the following model [@Olivoto2019].

$$ waasby_i = \frac{{\left( {r {Y_i} \times {\theta _Y}} \right) + \left( {r {W_i} \times {\theta _W}} \right)}}{{{\theta _Y} + {\theta _W}}} $$

where $waasby_i$ is the superiority index for the i-th genotype; $rY_i$ and $rW_i$ are the rescaled values (0-100) for the response variable (y) and the stability (WAAS or WAASB), respectively; $\theta _Y$ and $\theta_W$ are the weights for mean performance and stability, respectively.

This index was also already computed and stored into AMMI_model>GY>model. An intuitively plot may be obtained by running

i <- plot_waasby(waasb_model)
j <- plot_waasby(waasb_model, col.shape = c("gray20", "gray80"))
arrange_ggplot(i, j, tag_levels = list(c("e", "f")))

In the following example, we will apply the function wsmp() to the previously fitted model waasb_model aiming at planning different scenarios of waasby estimation by changing the weights assigned to the stability and the mean performance.vThe number of scenarios is defined by the arguments increment. By default, twenty-one different scenarios are computed. In this case, the the superiority index waasby is computed considering the following weights: stability (waasb or waas) = 100; mean performance = 0. In other words, only stability is considered for genotype ranking. In the next iteration, the weights becomes 95/5 (since increment = 5). In the third scenario, the weights become 90/10, and so on up to these weights become 0/100. In the last iteration, the genotype ranking for WAASY or WAASBY matches perfectly with the ranks of the response variable.

scenarios <- wsmp(waasb_model)

Printing the model outputs

print_table(scenarios$GY$hetcomb)

In addition, the genotype ranking depending on the number of multiplicative terms used to estimate the WAAS index is also computed.

print_table(scenarios$GY$hetdata)

Plotting the heat map graphics

The first type of heatmap shows the genotype ranking depending on the number of principal component axes used for estimating the WAASB index. An euclidean distance-based dendrogram is used for grouping the genotypes based on their ranks. The second type of heatmap shows the genotype ranking depending on the WAASB/GY ratio. The ranks obtained with a ratio of 100/0 considers exclusively the stability for genotype ranking. On the other hand, a ratio of 0/100 considers exclusively the productivity for genotype ranking. Four clusters are estimated (1) unproductive and unstable genotypes; (2) productive, but unstable genotypes; (3) stable, but unproductive genotypes; and (4), productive and stable genotypes [@Olivoto2019].

Ranks of genotypes depending on the number of PCA used to estimate the WAASB

plot(scenarios, type = 1)

Ranks of genotypes depending on the WAASB/GY ratio

plot(scenarios, type = 2)

Others BLUP-based stability indexes

@ColombariFilho2013 have shown the use of three BLUP-based indexes for selecting genotypes with performance and stability. The first is the harmonic mean of genotypic values -or BLUPS- (HMGV) a stability index that considers the genotype with the highest harmonic mean across environments as the most stable, as follows:

$$ HMG{V_i} = \frac{E}{{\sum\limits_{j = 1}^E {\frac{1}{{BLUP{_{ij}}}}}}} $$

The second is the relative performance of genotypic values (RPGV), an adaptability index estimated as follows:

$$ RPGV_i = \frac{1}{e}{\sum\limits_{j = 1}^e {BLUP_{ij}} /\mathop \mu \nolimits_j } $$

The third and last is the harmonic mean of relative performance of genotypic values (HMRPGV), a simultaneous selection index for stability, adaptability and mean performance, estimated as follows:

$$ HMRPG{V_i} = \frac{E}{{\sum\limits_{j = 1}^E {\frac{1}{{BLUP{_{ij}}/{\mu _j}}}} }} $$

Res_ind <- 
data_ge %>%
gamem_met(ENV, GEN, REP, GY, verbose = FALSE) %>% 
blup_indexes()

print_table(Res_ind$GY)

FAI-BLUP selection index

The FAI-BLUP is a multi-trait index based on factor analysis and ideotype-design recentely proposed by @Rocha2018. It is based on factor analysis, when the factorial scores of each ideotype are designed according to the desirable and undesirable factors. Then, a spatial probability is estimated based on genotype-ideotype distance, enabling genotype ranking. Here we will use the mixed-model mod as inpute data. By default, the selection is made to increase the value of all traits. Change this default with the arguments DI and UI.

data_g %>%
  gamem(GEN, REP, everything()) %>%
  fai_blup() %>%
  plot()
  • IMPORTANT fai_blup() recognizes models fitted with both gamem_met() and waasb(). For balanced data (all genotypes in all environments) waasb() and gamem_met() will return the same model. In case of unbalanced trials, the function waasb() will return an error since a complete two-way table is required to the singular value decomposition procedure.

Rendering engine {#rendering}

This vignette was built with pkgdown. All tables were produced with the package DT using the following function.

library(DT) # Used to make the tables
# Function to make HTML tables
print_table <- function(table, rownames = FALSE, digits = 3, ...){
  df <- datatable(table, rownames = rownames, extensions = 'Buttons',
                  options = list(scrollX = TRUE, 
                                 dom = '<<t>Bp>',
                                 buttons = c('copy', 'excel', 'pdf', 'print')), ...)
  num_cols <- c(as.numeric(which(sapply(table, class) == "numeric")))
  if(length(num_cols) > 0){
    formatSignif(df, columns = num_cols, digits = digits)
  } else{
    df
  }
}

References



TiagoOlivoto/WAASB documentation built on March 20, 2024, 4:18 p.m.