| imputation | R Documentation |
Missing values are not allowed by the AMMI, GGE or SREG methods. This function provides several methods to impute missing observations in data from multi-environment trials and to subsequently adjust the mentioned methods.
imputation(
Data,
genotype = "gen",
environment = "env",
response = "yield",
rep = NULL,
type = "EM-AMMI",
nPC = 2,
initial.values = NA,
precision = 0.01,
maxiter = 1000,
change.factor = 1,
simplified.model = FALSE,
scale = TRUE,
method = "EM",
row.w = NULL,
coeff.ridge = 1,
seed = NULL,
nb.init = 1,
Winf = 0.8,
Wsup = 1
)
Data |
dataframe containing genotypes, environments, repetitions (if any) and the phenotypic trait of interest. Other variables that will not be used in the analysis can be present. |
genotype |
column name containing genotypes. |
environment |
column name containing environments. |
response |
column name containing the phenotypic trait. |
rep |
column name containing replications. If this argument is NULL, there are no replications available in the data. Defaults to NULL. |
type |
imputation method. Either "EM-AMMI", "EM-GGE", "EM-SREG", "EM-bSREG", "Gabriel", "Eigenvector", "WGabriel", "EM-PCA". Defaults to "EM-AMMI". |
nPC |
number of components used to predict the missing values. Default to 2. |
initial.values |
initial values of the missing cells. It can be a single value or a vector of length equal to the number of missing cells. |
precision |
threshold for assessing convergence. |
maxiter |
maximum number of iteration for the algorithm. |
change.factor |
When 'change.factor' is equal to 1, the previous approximation is changed with the new values (standard EM). Smaller values can help convergence if changes are cyclic. |
simplified.model |
logical. If TRUE, calculates effects only in the first iteration to speed up convergence or help in cases where the regular procedure fails. |
scale |
boolean. By default TRUE for "EM-PCA". |
method |
"Regularized" or "EM" for "EM-PCA". |
row.w |
row weights for "EM-PCA". |
coeff.ridge |
ridge coefficient for "EM-PCA". |
seed |
integer for random initialization in "EM-PCA". |
nb.init |
number of random initializations for "EM-PCA". |
Winf |
lower weight for WGabriel. |
Wsup |
upper weight for WGabriel. |
Often, multi-environment experiments are unbalanced because several genotypes are not tested in some environments. Several methodologies have been proposed in order to solve this lack of balance caused by missing values, some of which are included in this function:
EM-AMMI: an iterative scheme built round the above procedure is used to obtain AMMI imputations from the EM algorithm. The additive parameters are initially set by computing the grand mean, genotype means and environment means obtained from the observed data. The residuals for the observed cells are initialized as the cell mean minus the genotype mean minus the environment mean plus the grand mean, and interactions for the missing positions are initially set to zero. The initial multiplicative parameters are obtained from the SVD of this matrix of residuals, and the missing values are filled by the appropriate AMMI estimates. In subsequent iterations, the usual AMMI procedure is applied to the completed matrix and the missing values are updated by the corresponding AMMI estimates. The arguments used for this method are:initial.values, precision, maxiter, change.factor and simplified.model
EM-GGE: Iterative SVD-based imputation focusing on G+GE.
EM-SREG: Iterative algorithm using the Sites Regression model. Supports variants like standard SVD and Bayesian PCA (EM-bSREG).
Gabriel: combines regression and lower-rank approximation using SVD. This method initially replaces the missing cells by arbitrary values, and subsequently the imputations are refined through an iterative scheme that defines a partition of the matrix for each missing value in turn and uses a linear regression of columns (or rows) to obtain the new imputation. The arguments used for this method is only the dataframe.
WGabriel: is a a modification of Gabriel method that uses weights chosen by cross-validation. The arguments used for this method are Winf and Wsup.
EM-PCA: impute the missing entries of a mixed data using the iterative PCA algorithm. The algorithm first consists imputing missing values with initial values. The second step of the iterative PCA algorithm is to perform PCA on the completed dataset to estimate the parameters. Then, it imputes the missing values with the reconstruction formulae of order nPC (the fitted matrix computed with nPC components for the scores and loadings). These steps of estimation of the parameters via PCA and imputation of the missing values using the fitted matrix are iterate until convergence. The arguments used for this methods are: nPC, scale, method, row.w, coeff.ridge, precision, seed, nb.init and maxiter
A matrix of the imputed data.
Paderewski, J. (2013). An R function for imputation of missing cells in two-way data sets by EM-AMMI algorithm. Communications in Biometry and Crop Science 8, 60–69.
Yan, W. (2013). Biplot analysis of incomplete two-way data. Crop Science, 53(1), 48-57. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2135/cropsci2012.05.0301")}
Arciniegas-Alarcón, S., García-Peña, M., Krzanowski, W., & Dias, C. T. S. (2014b). An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects. Biometrical Letters, 51(2), 75-88. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2478/bile-2014-0006")}
Angelini, J., Cervigni, G. D. L., & Quaglino, M. B. (2024). New imputation methodologies for genotype-by-environment data: an extensive study of properties of estimators. Euphytica, 220(6), 92. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s10681-024-03344-z")}
Julie Josse, Francois Husson (2016). missMDA: A Package for Handling Missing Values in Multivariate Data Analysis. Journal of Statistical Software 70, 1-31.
Arciniegas-Alarcón S., García-Peña M., Dias C.T.S., Krzanowski W.J. (2010). An alternative methodology for imputing missing data in trials with genotype-by-environment interaction. Biometrical Letters 47, 1–14.
Arciniegas-Alarcón S., García-Peña M., Krzanowski W.J., Dias C.T.S. (2014). An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects. Biometrical Letters 51, 75-88.
library(geneticae)
# Data without replications
library(agridat)
data(yan.winterwheat)
# generating missing values
yan.winterwheat[1,3]<-NA
yan.winterwheat[3,3]<-NA
yan.winterwheat[2,3]<-NA
imputation(yan.winterwheat, genotype = "gen", environment = "env",
response = "yield", type = "EM-AMMI")
# Data with replications
data(plrv)
head(plrv)
plrv$Yield[plrv$Locality == "Ayac" & plrv$Rep %in% c(1, 2, 3) & plrv$Genotype == '102.18'] <- NA
imputation(plrv, nPC = 2,genotype = "Genotype", environment = "Locality",
response = "Yield", rep ='Rep', type = "EM-AMMI")
imputation(plrv, genotype = "Genotype", environment = "Locality",
response = "Yield", rep ='Rep', type = "EM-SREG")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.