em: Expectation maximization algorithm to impute missing GxE...

Description Usage Arguments Details Value Examples

View source: R/em.R

Description

This function will impute missing cells in the GxE table using an expectation maximization algorithm.

Usage

1
2
em(Y, model, tol = 1e-04, maxiter = 100, k = NULL, fast = TRUE,
  Ytrue = NULL, plotMSE = FALSE, verbose = FALSE, ...)

Arguments

Y

matrix containing numeric values of cell means with genotypes on rows and environments on columns

model

character vector of length 1. bilinear model to be fit. Arguments can be "AMMI", "GGE", "SREG", "EGE", "GREG". "GGE" and "SREG" are equivalent, as are "EGE" and "GREG".

tol

scalar convergence tolerance threshold, defined as the sum of the absolute value of cell mean differences from iteration i and i-1 scaled by the standard deviation of the values in Y.

maxiter

integer. Maximum number of iterations.

k

number of PC to use for imputation. Default is NULL, k will be determined from the imputed data using the parametric bootstrap test.

fast

logical or integer. If false or 0, k will be deterined at each iteration (slow). If fast is non-zero, k will be estimated each iteration <= max(2, fast), and the last value of k will be used for remaining iterations. .

Ytrue

Same as Y but with known, non-mising values. This allows the user to evaluate the accuracy of imputation.

plotMSE

logical. Should the mean square error (MSE) be plotted?.

verbose

logical. Should details be printed?

...

Additional arguments.

Details

Missing values in the table of genotypes and environments are imputed using an expectation maximization algorithm. The algorithm exits and returns the imputed matrix once a tolerance threshold or maximum number of iterations is reached. This function is generally meant to be used by bilinear when missing cells are found, but the user can also use it to determine the imputation accuracy by providing the true values to 'Ytrue'.

If 'k' is set to an integer, then this number of PCs will be used for imputation. Otherwise, 'k' will be determined from the model fit using the 'test' argument provided to bilinear.

If 'fast' is set to TRUE, then the test will only be done for the first 2 iterations. If an integer is provided to 'fast', 'k' will be determined for the first 'fast' iterations.

If a complete matrix of true values is provided, the algorithm will calculate the mean square error. Additionally, if plot MSE is set to true, the MSE of each iteration will be plotted as the algorithm proceeds

If 'verbose' is true, details will be printed to stdout.

Value

Matrix with missing cells replaced by imputed values.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
data(soyMeanMat)
nMiss <- 10 
Ytrue <- soyMeanMat
Y <- soyMeanMat
Y[sample(1:prod(dim(Y)), nMiss)] <- NA

em(Y, model = "AMMI", tol = 1e-5, k = 1, maxiter = 20, Ytrue = Ytrue, plotMSE = TRUE)
em(Y, model = "AMMI", tol = 1e-5, k = 2, maxiter = 20, Ytrue = Ytrue, plotMSE = TRUE)
em(Y, model = "AMMI", tol = 1e-5, fast = FALSE, maxiter = 20, Ytrue = Ytrue, plotMSE = TRUE)
em(Y, model = "AMMI", tol = 1e-5, fast = 2, maxiter = 20, Ytrue = Ytrue, plotMSE = TRUE)

nsantantonio/Bilinear documentation built on Aug. 18, 2020, 2:31 p.m.