In priviere/PPBstats: An R package to perform analysis found within PPB programmes

AMMI (M6a) {#ammi}

Theory of the model

The experimental design used is fully replicated (D1). The Additive Main effects and Multiplicative Interaction (AMMI) model is based on frequentist statistics (section \@ref(section-freq)). The analysis can be broken down in two steps [@gauch_statistical_2006][@gauch_statistical_2008][@yan_gge_2007] :

an ANOVA with the following model :

$Y_{ijk} = \mu + \alpha_{i} + \theta_{j} + rep_{k}(\theta_{j}) + (\alpha\theta){ij} + \varepsilon{ijk}; \quad \varepsilon_{ijk} \sim \mathcal{N} (0,\sigma^2)$

With,

$Y_{ijk}$ the phenotypic value for replication $k$, germplasm $i$ and location $j$,
$\mu$ the general mean,
$\alpha_{i}$ the effect of germplasm $i$,
$\theta_{j}$ the effect of location $j$,
$rep_{k}(\theta_{j})$ the effect of the replication $k$ nested in location,
$(\alpha\theta)_{ij}$ the interaction effect of germplasm $\times$ location,
$\varepsilon_{ijk}$ the residuals.

Or, if there are several years in the data set:

$Y_{ijkl} = \mu + \alpha_{i} + \theta_{j} + \beta_{l} + (\alpha\theta){ij} + (\alpha\beta){il} + (\theta\beta){jl} + rep{k}((\theta\beta){jl}) + \varepsilon{ijkl}; \quad \varepsilon_{ijkl} \sim \mathcal{N} (0,\sigma^2)$

With,

$Y_{ijkl}$ the phenotypic value for replication $k$, germplasm $i$, location $j$ and year $l$,
$\beta_{l}$ the year $l$ effect,
$(\alpha\beta)_{il}$ the germplasm $\times$ year interaction effect,
$(\theta\beta)_{jl}$ the location $\times$ year interaction effect,
$rep_{k}((\theta\beta)_{jl})$ the effect of the replication $k$ nested in location $\times$ year,
$\varepsilon_{ijkl}$ the residuals,
and all other effects are the same as in the previous model.
a PCA that focus on the germplasm $\times$ location interaction :

$(\alpha\theta){ij} = \sum{n}^{N} \lambda_{n} \gamma_{in} \omega_{jn}$ ^[Note than in PPBstats, the PCA is done on $\alpha\theta_{ij} + \varepsilon_{ijkl}$. The residuals is in the last dimension of the PCA]

which can also be written :

$(\alpha\theta){ij} = \sum{n}^{N} (\sqrt{\lambda_{n}} \gamma_{in}) (\sqrt{\lambda_{n}} \omega_{jn})$

With,

$(\alpha\theta)_{ij}$ the interaction between germplasm $i$ and location $j$,
$N$ the number of dimensions (PCA componants) which has as maximum value the number of location,
$\lambda_{n}$ the eigen value for componant $n$,
$\gamma_{in}$ the eigen vector for germplasm $i$ and componant $n$,
$\omega_{jn}$ the eigen vector for location $j$ and componant $n$.

The data are double centered on location and germplasm. The PCA studies the structure of the interaction matrix. The locations are the variables and the germplasms are the individuals.

This PCA allows to detect

germplasm that are stable (i.e. contribute less to the interaction)
the germplasm that interact the most and with which location
the location that have the same profile regarding interaction

Steps with `PPBstats`

For AMMI analysis, you can follow these steps (Figure \@ref(fig:main-workflow)):

Format the data with format_data_PPBstats()
Run the model with model_GxE() and gxe_analysis = "AMMI"
Check model outputs to know if you can continue the analysis with check_model() and vizualise it with plot()
Get mean comparisons for each factor with mean_comparisons() and vizualise it with plot()
Get and visualize biplot with biplot_data() and plot()
Get groups of each parameters with parameters_groups() and visualise it with plot()

Format the data

data(data_model_GxE)
data_model_GxE = format_data_PPBstats(data_model_GxE, type = "data_agro")
head(data_model_GxE)

Run the model

To run model GxE on the dataset, used the function model_GxE. You can run it on one variable.

out_ammi = model_GxE(data_model_GxE, variable = "y1", gxe_analysis = "AMMI")

out_ammi is a list containing three elements :

info : a list with variable and gxe_analysis

out_ammi$info

ANOVA a list with five elements :
- model r out_ammi$ANOVA$model
- anova_model r out_ammi$ANOVA$anova_model
- germplasm_effects a list of two elements :
  - effects r out_ammi$ANOVA$germplasm_effects$effects
  - intra_variance r out_ammi$ANOVA$germplasm_effects$intra_variance
- location_effects r out_ammi$ANOVA$location_effects$effects
- interaction_matrix r out_ammi$ANOVA$interaction_matrix
- PCA : PCA object from FactoMineR

out_ammi$PCA

Check and visualize model outputs

The tests to check the model are explained in section \@ref(check-model-freq).

Check the model

Once the model is run, it is necessary to check if the outputs can be taken with confidence. This step is needed before going ahead in the analysis (in fact, object used in the next functions must come from check_model()).

out_check_ammi = check_model(out_ammi)

out_check_ammi is a list containing four elements :

model_GxE the output from the model
data_ggplot a list containing information for ggplot:
- data_ggplot_residuals a list containing :
  - data_ggplot_normality
  - data_ggplot_skewness_test
  - data_ggplot_kurtosis_test
  - data_ggplot_shapiro_test
  - data_ggplot_qqplot
- data_ggplot_variability_repartition_pie
- data_ggplot_var_intra

Visualize outputs

Once the computation is done, you can visualize the results with plot()

p_out_check_ammi = plot(out_check_ammi)

p_out_check_ammi is a list with:

residuals
- histogram : histogram with the distribution of the residuals r p_out_check_ammi$residuals$histogram
- qqplot r p_out_check_ammi$residuals$qqplot
- points r p_out_check_ammi$residuals$points
variability_repartition : pie with repartition of SumSq for each factor

p_out_check_ammi$variability_repartition

variance_intra_germplasm : repartition of the residuals for each germplasm (see Details for more information) With the hypothesis than the micro-environmental variation is equaly distributed on all the individuals (i.e. all the plants), the distribution of each germplasm represent the intra-germplasm variance. This has to been seen with caution:
- If germplasm have no intra-germplasm variance (i.e. pure line or hybrides) then the distribution of each germplasm represent only the micro-environmental variation.
- If germplasm have intra-germplasm variance (i.e. population such as landraces for example) then the distribution of each germplasm represent the micro-environmental variation plus the intra-germplasm variance.

p_out_check_ammi$variance_intra_germplasm

pca_composante_variance : variance caught by each dimension of the PCA run on the interaction matrix

p_out_check_ammi$pca_composante_variance

Get and visualize mean comparisons

The method to compute mean comparison are explained in section \@ref(mean-comp-check-freq).

Get mean comparisons

Get mean comparisons with mean_comparisons().

out_mean_comparisons_ammi = mean_comparisons(out_check_ammi, p.adj = "bonferroni")

out_mean_comparisons_ammi is a list of three elements:

info : a list with variable and gxe_analysis
data_ggplot_LSDbarplot_germplasm
data_ggplot_LSDbarplot_location
data_ggplot_LSDbarplot_year

Visualize mean comparisons

p_out_mean_comparisons_ammi = plot(out_mean_comparisons_ammi)

p_out_mean_comparisons_ammi is a list of three elements with barplots :

For each element of the list, there are as many graph as needed with nb_parameters_per_plot parameters per graph. Letters are displayed on each bar. Parameters that do not share the same letters are different regarding type I error (alpha) and alpha correction. The error I (alpha) and the alpha correction are displayed in the title.

germplasm : mean comparison for germplasm

pg = p_out_mean_comparisons_ammi$germplasm
names(pg)
pg$`1`

location : mean comparison for location

pl = p_out_mean_comparisons_ammi$location
names(pl)
pl$`1`

year : mean comparison for year Here there are no more than 2 years in the data set so it is NULL

p_out_mean_comparisons_ammi$year

Get and visualize biplot

The biplot represents information about the percentages of total variation explained by the two axes. It has to be linked to the total variation caught by the interaction. If the total variation is small, then the biplot is useless. If the total variation is high enought, then the biplot is useful if the two first dimension represented catch enought variation (the more the better).

Get biplot

out_biplot_ammi = biplot_data(out_check_ammi)

Visualize biplot

p_out_biplot_ammi = plot(out_biplot_ammi)

p_out_biplot_ammi is a list of three elements :

ecovalence

Ecovalence from @wricke_uber_1962 give part of interaction variance taken by germplasm and location. This parameter provides indications on the contribution of each variety to the interaction term and therefore on its stability over the different farms with respect to the productivity potential of each of them. It is generally described as a dynamic stability indicator [@becker_stability_1988]. A low ecovalence means low interaction, i.e. more stability.

Ecovalance of germplasm $i$ is $W_{i}=\sum_{i}^{n} (\eta_{i}\nu_{j})^{2}$

Ecovalance of location $j$ is $W_{j}=\sum_{j}^{n} (\eta_{i}\nu_{j})^{2}$.

Ecovalances are represented in fonction of mean effects by germplasm and location.

p_out_biplot_ammi$ecovalence

interaction which displays the interaction matrix

p_out_biplot_ammi$interaction

biplot being a list of four elements :

p_out_biplot_ammi$biplot$simple_biplot

Regarding the other elements of the list, it returns NULL as these visualisation is only done for gxe_analysis = "GGE".

p_out_biplot_ammi$biplot$which_won_where
p_out_biplot_ammi$biplot$mean_vs_stability
p_out_biplot_ammi$biplot$discrimitiveness_vs_representativeness

Get and vizualise groups of parameters

Get groups of parameters

In order to cluster locations or germplasms, you may use mulivariate analysis on a matrix with several variables in columns and parameter in rows.

This is done with parameter_groups() which do a PCA on this matrix.

Clusters are done based on HCPC method as explained here

Lets' have an example with three variables.

First run the models

out_ammi_2 = model_GxE(data_model_GxE, variable = "y2", gxe_analysis = "AMMI")
out_ammi_3 = model_GxE(data_model_GxE, variable = "y3", gxe_analysis = "AMMI")

Then check the models

out_check_ammi_2 = check_model(out_ammi_2)
out_check_ammi_3 = check_model(out_ammi_3)

Then run the function for germplasm. It can also be done on location or intra germplasm variance

out_parameter_groups = parameter_groups(
  list("y1" = out_check_ammi, "y2" = out_check_ammi_2, "y3" = out_check_ammi_3), 
  "germplasm"
  )

out_parameter_groups is list of two elements:

obj.pca : the PCA object from FactoMineR::PCA()
clust, a list of two elements:
- res.hcpc : the HCPC object from FactoMineR::HCPC()
- clust : the dataframe with cluster assigned to each individual

Visualize groups of parameters

Visualize outputs with plot

p_germplasm_group = plot(out_parameter_groups)

p_germplasm_group is list of two elements :

pca : a list with three elements on the PCA on the group of parameters :
- composante_variance : variance caught by each dimension of the PCA r p_germplasm_group$pca$composante_variance
- ind : graph of individuals r p_germplasm_group$pca$ind
- var : graph of variables r p_germplasm_group$pca$var
clust : output from factextra::fviz_cluster(), a list of number of cluster + 1 element

cl = p_germplasm_group$clust
names(cl)

cl$cluster_all

cl$cluster_1

post hoc analysis to visualize variation repartition for several variables

list_out_check_model = list(out_check_ammi, out_check_ammi_2, out_check_ammi_3)
names(list_out_check_model) = c("ammi_1", "ammi_2", "ammi_3")
post_hoc_variation(list_out_check_model)

Apply the workflow to several variables {#workflow_gxe}

If you wish to apply the AMMI workflow to several variables, you can use lapply() with the following code :

workflow_gxe = function(x, gxe){
  out_gxe = model_GxE(data_model_GxE, variable = x, gxe_analysis = gxe)

  out_check_gxe = check_model(out_gxe)
  p_out_check_gxe = plot(out_check_gxe)

  out_mean_comparisons_gxe = mean_comparisons(out_check_gxe, p.adj = "bonferroni")
  p_out_mean_comparisons_gxe = plot(out_mean_comparisons_gxe)

  out_biplot_gxe = biplot_data(out_check_gxe)
  p_out_biplot_gxe = plot(out_biplot_gxe)

  out = list(
    "out_gxe" = out_gxe,
    "out_check_gxe" = out_check_gxe,
    "p_out_check_gxe" = p_out_check_gxe,
    "out_mean_comparisons_gxe" = out_mean_comparisons_gxe,
    "p_out_mean_comparisons_gxe" = p_out_mean_comparisons_gxe,
    "out_biplot_gxe" = out_biplot_gxe,
    "p_out_biplot_gxe" = p_out_biplot_gxe
  )

  return(out)
}

vec_variables = c("y1", "y2", "y3")

out = lapply(vec_variables, workflow_gxe, "AMMI")
names(out) = vec_variables

out_parameter_groups = parameter_groups(
  list("y1" = out$y1$out_check_gxe, "y2" = out$y2$out_check_gxe, "y3" = out$y3$out_check_gxe), 
  "germplasm" )

p_germplasm_group = plot(out_parameter_groups)

priviere/PPBstats documentation built on May 6, 2021, 1:20 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

priviere/PPBstats
An R package to perform analysis found within PPB programmes

In priviere/PPBstats: An R package to perform analysis found within PPB programmes

AMMI (M6a) {#ammi}

Theory of the model

Steps with `PPBstats`

Format the data

Run the model

Check and visualize model outputs

Check the model

Visualize outputs

Get and visualize mean comparisons

Get mean comparisons

Visualize mean comparisons

Get and visualize biplot

Get biplot

Visualize biplot

Get and vizualise groups of parameters

Get groups of parameters

Visualize groups of parameters

post hoc analysis to visualize variation repartition for several variables

Apply the workflow to several variables {#workflow_gxe}

R Package Documentation

Browse R Packages

We want your feedback!

priviere/PPBstats An R package to perform analysis found within PPB programmes

In priviere/PPBstats: An R package to perform analysis found within PPB programmes

AMMI (M6a) {#ammi}

Theory of the model

Steps with PPBstats

Format the data

Run the model

Check and visualize model outputs

Check the model

Visualize outputs

Get and visualize mean comparisons

Get mean comparisons

Visualize mean comparisons

Get and visualize biplot

Get biplot

Visualize biplot

Get and vizualise groups of parameters

Get groups of parameters

Visualize groups of parameters

post hoc analysis to visualize variation repartition for several variables

Apply the workflow to several variables {#workflow_gxe}

R Package Documentation

Browse R Packages

We want your feedback!

priviere/PPBstats
An R package to perform analysis found within PPB programmes

Steps with `PPBstats`