impute_missing_val: Missing value imputation In metan: Multi Environment Trials Analysis

Description

Impute the missing entries of a matrix with missing values using different algorithms. See Details section for more details

Usage

 1 2 3 4 5 6 7 8 9 impute_missing_val( .data, naxis = 1, algorithm = "EM-SVD", tol = 1e-10, max_iter = 1000, simplified = FALSE, verbose = TRUE )

Arguments

 .data A matrix to impute the missing entries. Frequently a two-way table of genotype means in each environment. naxis The rank of the Singular Value Approximation. Defaults to 1. algorithm The algorithm to impute missing values. Defaults to "EM-SVD". Other possible values are "EM-AMMI" and "colmeans". See Details section. tol The convergence tolerance for the algorithm. max_iter The maximum number of steps to take. If max_iter is achieved without convergence, the algorithm will stop with a warning. simplified Valid argument when algorithm = "EM-AMMI". IF FALSE (default), the current effects of rows and columns change from iteration to iteration. If TRUE, the general mean and effects of rows and columns are computed in the first iteration only, and in next iterations uses these values. verbose Logical argument. If verbose = FALSE the code will run silently.

Details

EM-AMMI algorithm

The EM-AMMI algorithm completes a data set with missing values according to both main and interaction effects. The algorithm works as follows (Gauch and Zobel, 1990):

1. The initial values are calculated as the grand mean increased by main effects of rows and main effects of columns. That way, the matrix of observations is pre-filled in.

2. The parameters of the AMMI model are estimated.

3. The adjusted means are calculated based on the AMMI model with naxis principal components.

4. The missing cells are filled with the adjusted means.

5. The root mean square error of the predicted values (RMSE_p) is calculated with the two lasts iteration steps. If RMSE_p > tol, the steps 2 through 5 are repeated. Declare convergence if RMSE_p < tol. If max_iter is achieved without convergence, the algorithm will stop with a warning.

EM-SVD algorithm

The EM-SVD algorithm impute the missing entries using a low-rank Singular Value Decomposition approximation estimated by the Expectation-Maximization algorithm. The algorithm works as follows (Troyanskaya et al., 2001).

1. Initialize all NA values to the column means.

2. Compute the first naxis terms of the SVD of the completed matrix

3. Replace the previously missing values with their approximations from the SVD

4. The root mean square error of the predicted values (RMSE_p) is calculated with the two lasts iteration steps. If RMSE_p > tol, the steps 2 through 3 are repeated. Declare convergence if RMSE_p < tol. If max_iter is achieved without convergence, the algorithm will stop with a warning.

colmeans algorithm

The colmeans algorithm simply impute the missing entires using the column mean of the respective entire. Thus, there is no iteractive process.

Value

An object of class imv with the following values:

• .data The imputed matrix

• pc_ss The sum of squares representing variation explained by the principal components

• iter The final number of iterations.

• Final_RMSE The maximum change of the estimated values for missing cells in the last step of iteration.

• final_axis The final number of principal component axis.

• convergence Logical value indicating whether the modern converged.

References

Gauch, H. G., & Zobel, R. W. (1990). Imputing missing yield trial data. Theoretical and Applied Genetics, 79(6), 753-761. doi: 10.1007/BF00224240

Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., . Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520-525.

Examples

 1 2 3 4 5 6 7 8 library(metan) mat <- (1:20) %*% t(1:10) mat # 10% of missing values at random miss_mat <- random_na(mat, prop = 10) miss_mat mod <- impute_missing_val(miss_mat) mod\$.data

metan documentation built on Nov. 10, 2021, 9:11 a.m.