lrSVDplus | R Documentation |
This function implements an extended version of the log-ratio SVD algorithm (lrSVD
function) to simultaneously deal with both zeros (i.e. data below detection limit, rounded zeros) and missing data in compositional data sets.
Note: zeros and missing data must be labelled using 0 and NA
respectively to use this function.
lrSVDplus(X, dl = NULL, frac = 0.65,
ncp = 2, beta = 0.5, method = c("ridge", "EM"), row.w = NULL,
coeff.ridge = 1, threshold = 1e-04, seed = NULL, nb.init = 1,
max.iter = 1000, z.warning = 0.8, z.delete = TRUE,
...)
X |
Compositional data set ( |
dl |
Numeric vector or matrix of detection limits/thresholds. These must be given on the same scale as |
frac |
Parameter for initial multiplicative simple replacement of left-censored data (see |
ncp |
Number of components in low-rank matrix approximation (default = 2). |
beta |
Weighting parameter, balance between the two conditions in objective function (default = 0.5). |
method |
Parameter estimation method for the iterative algorithm ( |
row.w |
row weights (default = NULL, a vector of 1 for uniform row weights). |
coeff.ridge |
Used when |
threshold |
Threshold for assessing convergence (default = 1e-04). |
seed |
Seed for random initialisation of the algorithm (default |
nb.init |
Number of random initialisations (default = 1). |
max.iter |
Maximum number of iterations for the algorithm (default = 1000). |
z.warning |
Threshold used to identify individual rows or columns including an excess of zeros/unobserved values (to be specify in proportions, default |
z.delete |
Logical value. If set to |
... |
Further arguments. |
The procedure starts with an initial imputation of zeros (using simple replacement with frac*dl
) and missing values (using geometric mean imputation from observed data). Subsequently, the iterative algorithm is run until convergence (see ?lrSVD
for more details).
A data.frame
object containing the imputed compositional data set expressed in the original scale.
Palarea-Albaladejo, J, Antoni Martín-Fernández, J, Ruiz-Gazen, A, Thomas-Agnan, C. lrSVD: An efficient imputation algorithm for incomplete high-throughput compositional data. Journal of Chemometrics 2022; 36: e3459.
zPatterns
, lrSVD
, lrDA
, multRepl
, multLN
, multKM
, cmultRepl
, lrSVD
# Data set closed to 100 (percentages, common dl = 1%)
# (Note that zeros and missing in the same row or column are allowed)
X <- matrix(c(26.91,8.08,12.59,31.58,6.45,14.39,
39.73,41.42,0.00,NA,6.80,12.05,
NA,35.13,7.96,14.28,35.12,7.51,
10.85,46.40,31.89,10.86,0.00,0.00,
10.85,16.27,NA,9.16,19.57,44.15,
38.09,7.62,23.68,9.70,20.91,0.00,
NA,9.89,18.04,44.30,9.04,18.73,
44.41,15.04,7.95,0.00,10.82,21.78,
11.50,30.33,6.85,13.92,30.82,6.58,
19.04,42.59,0.00,38.37,0.00,0.00),byrow=TRUE,ncol=6)
X_lrSVDplus <- lrSVDplus(X,dl=rep(1,6))
# Multiple limits of detection by component
mdl <- matrix(0,ncol=6,nrow=10)
mdl[2,] <- rep(1,6)
mdl[4,] <- rep(0.75,6)
mdl[6,] <- rep(0.5,6)
mdl[8,] <- rep(0.5,6)
mdl[10,] <- c(0,0,1,0,0.8,0.7)
X_lrSVDplus2 <- lrSVDplus(X,dl=mdl)
# Non-closed compositional data set
data(LPdataZM) # (in ppm; 0 is nondetect and NA is missing data)
dl <- c(2,1,0,0,2,0,6,1,0.6,1,1,0,0,632,10) # limits of detection (0 for no limit)
LPdataZM2 <- subset(LPdataZM,select=-c(Cu,Ni,La)) # select a subset for illustration purposes
dl2 <- dl[-c(5,7,10)]
LPdataZM2_lrSVDplus <- lrSVDplus(LPdataZM2,dl=dl2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.