lrEMplus: Log-ratio EM algorithm (plus)

lrEMplusR Documentation

Log-ratio EM algorithm (plus)


This function implements an extended version of the log-ratio EM algorithm (lrEM function) to simultaneously deal with both zeros (i.e. data below detection limit, rounded zeros) and missing data in compositional data sets.

Note: zeros and missing data must be labelled using 0 and NA respectively to use this function.


lrEMplus(X, dl = NULL, rob = FALSE,
            ini.cov = c("complete.obs", "multRepl"), frac = 0.65,
            tolerance = 0.0001, max.iter = 50, rlm.maxit = 150,
            suppress.print = FALSE, closure = NULL,
            z.warning = 0.8, z.delete = TRUE, delta = NULL)



Compositional data set (matrix or data.frame class).


Numeric vector or matrix of detection limits/thresholds. These must be given on the same scale as X. If NULL the column minima are used as thresholds.


Logical value. FALSE provides maximum-likelihood estimates of model parameters (default), TRUE provides robust parameter estimates.


Initial estimation of either the log-ratio covariance matrix (ML estimation) or unobserved data (robust estimation). It can be based on either complete observations ("complete.obs", default) or multiplicative simple replacement ("multRepl").


If ini.cov="multRepl", parameter for initial multiplicative simple replacement of left-censored data (see multRepl) (default = 0.65).


Convergence criterion (default = 0.0001).


Maximum number of iterations (default = 50).


If rob=TRUE, maximum number of iterations for the embedded robust regression estimation (default = 150; see rlm for details).


Suppress printed feedback (suppress.print = FALSE, default).


Closure value used to add a residual part if needed when ini.cov="multRepl" is used (see ?multRepl).


Threshold used to identify individual rows or columns including an excess of zeros/unobserved values (to be specify in proportions, default z.warning=0.8).


Logical value. If set to TRUE, rows/columns identified by z.warning are omitted in the imputed data set. Otherwise, the function stops in error when rows/columns are identified by z.warning (default z.delete=TRUE).


This argument has been deprecated and replaced by frac (see package's NEWS for details).


The procedure starts with an initial imputation of either zeros (using simple replacement with frac*dl) or missing values (using geometric mean imputation from observed data) depending of which problem is the least frequent in the data set. Subsequently, iterative calls to lrEM replace zeros and missing data alternately until convergence to a stable solution or the maximum number of iterations is reached.

See ?lrEM for more details.


A data.frame object containing the imputed compositional data set in the same scale as the original. The number of iterations required for convergence is also printed (this can be suppressed by setting suppress.print=TRUE).


# Data set closed to 100 (percentages, common dl = 1%)
# (Note that zeros and missing in the same row or column are allowed)
X <- matrix(c(26.91,8.08,12.59,31.58,6.45,14.39,
X_lrEMplus <- lrEMplus(X,dl=rep(1,6),ini.cov="multRepl")
X_roblrEMplus <- lrEMplus(X,dl=rep(1,6),ini.cov="multRepl",rob=TRUE,max.iter=4)

# Multiple limits of detection by component
mdl <- matrix(0,ncol=6,nrow=10)
mdl[2,] <- rep(1,6)
mdl[4,] <- rep(0.75,6)
mdl[6,] <- rep(0.5,6)
mdl[8,] <- rep(0.5,6)
mdl[10,] <- c(0,0,1,0,0.8,0.7)

X_lrEMplus2 <- lrEMplus(X,dl=mdl,ini.cov="multRepl")

# Non-closed compositional data set
data(LPdataZM) # (in ppm; 0 is nondetect and NA is missing data)

dl <- c(2,1,0,0,2,0,6,1,0.6,1,1,0,0,632,10) # limits of detection (0 for no limit)
LPdataZM2 <- subset(LPdataZM,select=-c(Cu,Ni,La))  # select a subset for illustration purposes
dl2 <- dl[-c(5,7,10)]

LPdataZM2_lrEMplus <- lrEMplus(LPdataZM2,dl=dl2)

