lrSVDplus: Log-ratio SVD algorithm (plus)

View source: R/lrSVDplus.R

lrSVDplusR Documentation

Log-ratio SVD algorithm (plus)

Description

This function implements an extended version of the log-ratio SVD algorithm (lrSVD function) to simultaneously deal with both zeros (i.e. data below detection limit, rounded zeros) and missing data in compositional data sets.

Note: zeros and missing data must be labelled using 0 and NA respectively to use this function.

Usage

lrSVDplus(X, dl = NULL, frac = 0.65,
             ncp = 2, beta = 0.5, method = c("ridge", "EM"), row.w = NULL,
             coeff.ridge = 1, threshold = 1e-04, seed = NULL, nb.init = 1,
             max.iter = 1000, z.warning = 0.8, ...)

Arguments

X

Compositional data set (matrix or data.frame class).

dl

Numeric vector or matrix of detection limits/thresholds. These must be given on the same scale as X.

frac

Parameter for initial multiplicative simple replacement of left-censored data (see multRepl) (default = 0.65).

ncp

Number of components in low-rank matrix approximation (default = 2).

beta

Weighting parameter, balance between the two conditions in objective function (default = 0.5).

method

Parameter estimation method for the iterative algorithm (method = "ridge", default).

row.w

row weights (default = NULL, a vector of 1 for uniform row weights).

coeff.ridge

Used when method = "ridge" (default = 1).

threshold

Threshold for assessing convergence (default = 1e-04).

seed

Seed for random initialisation of the algorithm (default seed = NULL, unobserved values initially imputed by the column mean).

nb.init

Number of random initialisations (default = 1).

max.iter

Maximum number of iterations for the algorithm (default = 1000).

z.warning

Threshold used to delete individual rows or columns including an excess of zeros/unobserved values (to be specify in proportions, default z.warning=0.8).

...

Further arguments.

Details

The procedure starts with an initial imputation of zeros (using simple replacement with frac*dl) and missing values (using geometric mean imputation from observed data). Subsequently, the iterative algorithm is run until convergence (see ?lrSVD for more details).

Value

A data.frame object containing the imputed compositional data set expressed in the original scale.

References

Palarea-Albaladejo, J, Antoni Martín-Fernández, J, Ruiz-Gazen, A, Thomas-Agnan, C. lrSVD: An efficient imputation algorithm for incomplete high-throughput compositional data. Journal of Chemometrics 2022; 36: e3459.

See Also

zPatterns, lrSVD, lrDA, multRepl, multLN, multKM, cmultRepl, lrSVD

Examples

# Data set closed to 100 (percentages, common dl = 1%)
# (Note that zeros and missing in the same row or column are allowed)
X <- matrix(c(26.91,8.08,12.59,31.58,6.45,14.39,
              39.73,41.42,0.00,NA,6.80,12.05,
              NA,35.13,7.96,14.28,35.12,7.51,
              10.85,46.40,31.89,10.86,0.00,0.00,
              10.85,16.27,NA,9.16,19.57,44.15,
              38.09,7.62,23.68,9.70,20.91,0.00,
              NA,9.89,18.04,44.30,9.04,18.73,
              44.41,15.04,7.95,0.00,10.82,21.78,
              11.50,30.33,6.85,13.92,30.82,6.58,
              19.04,42.59,0.00,38.37,0.00,0.00),byrow=TRUE,ncol=6)
              
X_lrSVDplus <- lrSVDplus(X,dl=rep(1,6))

# Multiple limits of detection by component
mdl <- matrix(0,ncol=6,nrow=10)
mdl[2,] <- rep(1,6)
mdl[4,] <- rep(0.75,6)
mdl[6,] <- rep(0.5,6)
mdl[8,] <- rep(0.5,6)
mdl[10,] <- c(0,0,1,0,0.8,0.7)

X_lrSVDplus2 <- lrSVDplus(X,dl=mdl)

# Non-closed compositional data set
data(LPdataZM) # (in ppm; 0 is nondetect and NA is missing data)

dl <- c(2,1,0,0,2,0,6,1,0.6,1,1,0,0,632,10) # limits of detection (0 for no limit)
LPdataZM2 <- subset(LPdataZM,select=-c(Cu,Ni,La))  # select a subset for illustration purposes
dl2 <- dl[-c(5,7,10)]

LPdataZM2_lrSVDplus <- lrSVDplus(LPdataZM2,dl=dl2)

zCompositions documentation built on Aug. 24, 2023, 1:08 a.m.