R/impute.R
In geneticae: Statistical Tools for the Analysis of Multi Environment Agronomic Trials

Documented in imputation

#'Imputation of missing cells in two-way data sets
#'
#'Missing values are not allowed by the AMMI or GGE methods. This function
#'provides several methods to impute missing observations in data from
#'multi-environment trials and to subsequently adjust the mentioned methods.
#'
#'@param Data dataframe containing genotypes, environments, repetitions (if any)
#'  and the phenotypic trait of interest. Other variables that will not be used
#'  in the analysis can be present.
#'@param genotype column name containing genotypes.
#'@param environment column name containing environments.
#'@param response column name containing the phenotypic trait.
#'@param rep column name containing replications. If this argument is NULL,
#'  there are no replications available in the data. Defaults to NULL.
#'@param type imputation method. Either "EM-AMMI",
#'  "Gabriel","WGabriel","EM-PCA". Defaults to "EM-AMMI".
#'@param nPC number of components used to predict the missing values.
#'  Default to 2.
#'@param initial.values initial values of the missing cells. It can be a single
#'  value or a vector of length equal to the number of missing cells (starting
#'  from the missing values in the first column). If omitted, the initial values
#'  will be obtained by the main effects from the corresponding model, that is,
#'  by the grand mean of the observed data increased (or decreased) by row and
#'  column main effects.
#'@param precision threshold for assessing convergence.
#'@param maxiter maximum number of iteration for the algorithm.
#'@param change.factor  When `change.factor` is equal to 1, the previous
#'  approximation is changed with the new values of missing cells (standard
#'  EM-AMMI algorithm). However, when `change.factor` less than 1, then the new
#'  approximations are computed and the values of missing cells are changed in
#'  the direction of this new approximation but the change is smaller. It could
#'  be useful if the changes are cyclic and thus convergence could not be
#'  reached. Usually, this argument should not affect the final outcome (that
#'  is, the imputed values) as compared to the default value of `change.factor`
#'  = 1.
#'@param simplified.model the AMMI model contains the general mean, effects of
#'  rows, columns and interaction terms. So the EM-AMMI algorithm in step 2
#'  calculates the current effects of rows and columns; these effects change
#'  from iteration to iteration because the empty (at the outset) cells in each
#'  iteration are filled with different values. In step 3 EM-AMMI uses those
#'  effects to re-estimate cells marked as missed (as default,
#'  simplified.model=FALSE). It is, however, possible that this procedure will
#'  not converge. Thus the user is offered a simplified EM-AMMI procedure that
#'  calculates the general mean and effects of rows and columns only in the
#'  first iteration and in next iterations uses these values
#'  (simplified.model=TRUE). In this simplified procedure the initial values
#'  affect the outcome (whilst EM-AMMI results usually do not depend on initial
#'  values). For the simplified procedure the number of iterations to
#'  convergence is usually smaller and, furthermore, convergence will be reached
#'  even in some cases where the regular procedure fails. If the regular
#'  procedure does not converge for the standard initial values, the simplified
#'  model can be used to determine a better set of initial values.
#'@param scale boolean. By default TRUE leading to a same weight for each
#'  variable
#'@param method "Regularized" by default or "EM"
#'@param row.w row weights (by default, a vector of 1 for uniform row weights)
#'@param coeff.ridge 1 by default to perform the regularized imputePCA
#'  algorithm; useful only if method="Regularized". Other regularization terms
#'  can be implemented by setting the value to less than 1 in order to
#'  regularized less (to get closer to the results of the EM method
#'@param seed integer, by default seed = NULL implies that missing values are
#'  initially imputed by the mean of each variable. Other values leads to a
#'  random initialization
#'@param nb.init integer corresponding to the number of random initializations;
#'  the first initialization is the initialization with the mean imputation
#'@param Winf peso inferior
#'@param Wsup peso superior
#'
#'@return imputed data matrix
#'
#'@details
#'
#'Often, multi-environment experiments are unbalanced because several genotypes
#'are not tested in some environments. Several methodologies have been proposed
#'in order to solve this lack of balance caused by missing values, some of which
#'are included in this function:
#'
#'\itemize{
#'\item EM-AMMI: an iterative scheme built round the above procedure is used to
#'obtain AMMI imputations from the EM algorithm. The additive parameters are
#'initially set by computing the grand mean, genotype means and environment
#'means obtained from the observed data. The residuals for the observed cells
#'are initialized as the cell mean minus the genotype mean minus the environment
#'mean plus the grand mean, and interactions for the missing positions are
#'initially set to zero. The initial multiplicative parameters are obtained from
#'the SVD of this matrix of residuals, and the missing values are filled by the
#'appropriate AMMI estimates. In subsequent iterations, the usual AMMI procedure
#'is applied to the completed matrix and the missing values are updated by the
#'corresponding AMMI estimates. The arguments used for this method
#'are:initial.values, precision, maxiter, change.factor and simplified.model
#'
#'\item Gabriel: combines regression and lower-rank approximation using SVD.
#'This method initially replaces the missing cells by arbitrary values, and
#'subsequently the imputations are refined through an iterative scheme that
#'defines a partition of the matrix for each missing value in turn and uses a
#'linear regression of columns (or rows) to obtain the new imputation. The
#'arguments used for this method is only the dataframe.
#'
#'\item WGabriel: is a a modification of Gabriel method that uses weights chosen
#'by cross-validation. The arguments used for this method are Winf and Wsup.
#'
#'\item EM-PCA: impute the missing entries of a mixed data using the iterative
#'PCA algorithm. The algorithm first consists imputing missing values with
#'initial values. The second step of the iterative PCA algorithm is to perform
#'PCA on the completed dataset to estimate the parameters. Then, it imputes the
#'missing values with the reconstruction formulae of order nPC (the fitted
#'matrix computed with nPC components for the scores and loadings). These steps
#'of estimation of the parameters via PCA and imputation of the missing values
#'using the fitted matrix are iterate until convergence. The arguments used for
#'this methods are: nPC, scale, method, row.w, coeff.ridge, precision, seed,
#'nb.init and maxiter
#'
#'}
#'
#'
#'@references Paderewski, J. (2013). \emph{An R function for imputation of missing
#'  cells in two-way data sets by EM-AMMI algorithm}. Communications in Biometry
#'  and Crop Science 8, 60–69.
#'@references Julie Josse, Francois Husson (2016). missMDA: A Package for
#'  Handling Missing Values in Multivariate Data Analysis. Journal of
#'  Statistical Software 70, 1-31.
#'@references Arciniegas-Alarcón S., García-Peña M., Dias C.T.S., Krzanowski
#'  W.J. (2010). \emph{An alternative methodology for imputing missing data in
#'  trials with genotype-by-environment interaction}. Biometrical Letters 47,
#'  1–14.
#'@references Arciniegas-Alarcón S., García-Peña M., Krzanowski W.J., Dias
#'  C.T.S. (2014). \emph{An alternative methodology for imputing missing data in
#'  trials with genotype-byenvironment interaction: some new aspects.}
#'  Biometrical Letters 51, 75-88.
#'
#'@export
#'
#' @examples
#' library(geneticae)
#' # Data without replications
#' library(agridat)
#' data(yan.winterwheat)
#'
#' # generating missing values
#' yan.winterwheat[1,3]<-NA
#' yan.winterwheat[3,3]<-NA
#' yan.winterwheat[2,3]<-NA
#'
#' imputation(yan.winterwheat, genotype = "gen", environment = "env",
#'            response = "yield", type = "EM-AMMI")
#'
#' # Data with replications
#' data(plrv)
#' plrv[1,3] <- NA
#' plrv[3,3] <- NA
#' plrv[2,3] <- NA
#' imputation(plrv, genotype = "Genotype", environment = "Locality",
#'            response = "Yield", rep = "Rep", type = "EM-AMMI")
#'
#'@importFrom stats var
#'@importFrom missMDA imputePCA
#'@importFrom dplyr group_by summarise rename %>%
#'@importFrom rlang sym
#'@importFrom tidyr pivot_wider
#'
imputation <- function(Data, genotype="gen",environment="env", response="yield", rep=NULL,type="EM-AMMI",
                       nPC=2, initial.values=NA, precision=0.01, maxiter=1000, change.factor=1, simplified.model=FALSE,
                        scale = TRUE, method = "EM",
                       row.w = NULL, coeff.ridge = 1, seed = NULL, nb.init = 1, Winf=0.8,Wsup=1) {

  if (missing(Data)) stop("Need to provide Data data frame")
  if (!any(is.na(Data))) stop("There are not missing data in input data frame")
    stopifnot(
    class(Data) %in% c("data.frame"),
    class(rep)%in% c("character", "NULL"),
    class(genotype) == "character",
    class(environment) == "character",
    class(response) == "character",
    type %in% c("EM-AMMI", "Gabriel","WGabriel","EM-PCA"),
    class(nPC) == "numeric",
    # class(initial.values)  %in% c(NA, "vector", "numeric"),
    class(precision) == "numeric",
    class(maxiter) == "numeric",
    class(change.factor) == "numeric",
    class(simplified.model) == "logical",
    # class(k) == "numeric",
    class(scale) == "logical",
    method %in% c("Regularized", "EM"),
    class(row.w)  %in% c("NULL", "vector", "numeric"),
    class(coeff.ridge) == "numeric",
    class(seed) %in% c("NULL", "numeric"),
    class(nb.init) == "numeric",
    class(Winf) == "numeric",
    class(Wsup) == "numeric"
  )

    if(!is.null(rep)){
      Data <-
        Data %>%
        group_by(!!sym(genotype), !!sym(environment)) %>%
        summarise(mean_resp=mean(!!sym(response)))%>%
        pivot_wider( names_from = environment, values_from = mean_resp) %>%
        as.data.frame()

    } else{
      Data <-
        Data %>%
        pivot_wider( names_from = environment, values_from = response) %>%
        as.data.frame()

    }

  rownames(Data) <- pull(Data, genotype)
  Data <- dplyr::select(Data, -!!sym(genotype))

  if(type=="EM-AMMI"){
     matrix<-EM.AMMI(Data, PC.nb=nPC, initial.values=initial.values, precision=precision,
                     max.iter=maxiter, change.factor=change.factor, simplified.model=simplified.model)$X
   }
   if(type=="Gabriel"){
     matrix<-Gabriel.Calinski(Data)$GabrielImput
   }
   if(type=="WGabriel"){
     matrix<-WGabriel(Data,Winf,Wsup)$GabrielWImput
   }
   if(type=="EM-PCA"){
     matrix<-imputePCA(Data, ncp = nPC, scale = scale, method = method,
                       row.w = row.w, coeff.ridge = coeff.ridge, threshold = precision, seed = seed, nb.init = nb.init,
                       maxiter = maxiter)$completeObs
   }

return(matrix)
}

Any scripts or data that you put into this service are public.

geneticae documentation built on July 20, 2022, 5:06 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

geneticae
Statistical Tools for the Analysis of Multi Environment Agronomic Trials

R/impute.R
In geneticae: Statistical Tools for the Analysis of Multi Environment Agronomic Trials

Defines functions imputation

Documented in imputation

Try the geneticae package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

geneticae Statistical Tools for the Analysis of Multi Environment Agronomic Trials

R/impute.R In geneticae: Statistical Tools for the Analysis of Multi Environment Agronomic Trials

Defines functions imputation

Documented in imputation

Try the geneticae package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

geneticae
Statistical Tools for the Analysis of Multi Environment Agronomic Trials

R/impute.R
In geneticae: Statistical Tools for the Analysis of Multi Environment Agronomic Trials