multKM: Multiplicative Kaplan-Meier smoothing spline (KMSS)...

multKMR Documentation

Multiplicative Kaplan-Meier smoothing spline (KMSS) replacement

Description

This function implements non-parametric multiplicative KMSS imputation of left-censored values (e.g. values below detection limit, rounded zeros) in compositional data sets. It is based on simulation from a smoothing spline fitted to the Kaplan-Meier (KM) estimate of the empirical cumulative distribution function (ECDF) of the data.

Usage

multKM(X, label = NULL, dl = NULL, n.draws = 1000, n.knots = NULL,
          z.warning = 0.8, z.delete = TRUE)

Arguments

X

Compositional data set (matrix or data.frame class).

label

Unique label (numeric or character) used to denote zeros/unobserved left-censored values in X.

dl

Numeric vector or matrix of detection limits/thresholds. These must be given on the same scale as X. If NULL the column minima are used as thresholds.

n.draws

Number of random draws from the inverse KM ECDF generated to produce an averaged imputed value (n.draws=1000, default).

n.knots

Integer or function giving the number of knots used for fitting a cubic smoothing spline to the KM ECDF (see smooth.spline for default value). It allows for a vector or list of settings per column of X.

z.warning

Threshold used to identify individual rows or columns including an excess of zeros/unobserved values (to be specify in proportions, default z.warning=0.8).

z.delete

Logical value. If set to TRUE, rows/columns identified by z.warning are omitted in the imputed data set. Otherwise, the function stops in error when rows/columns are identified by z.warning (default z.delete=TRUE).

Details

This function imputes left-censored compositional values by averaging (geometric mean) n random draws (n.draws argument) from a cubic smoothing spline curve fitting the inverse KM ECDF below the corresponding limit of detection or censoring threshold. It then applies a multiplicative adjustment to preserve the multivariate compositional properties of the samples. It allows for either single (vector form) or multiple (matrix form, same size as X) limits of detection by component. Although note that it is equivalent to simple substitution by the limit of detection for singly censored components. Any threshold value can be set for non-censored elements (e.g. use 0 if no threshold for a particular column or element of the data matrix).

It produces an imputed data set on the same scale as the input data set. If X is not closed to a constant sum, then the results are adjusted to provide a compositionally equivalent data set, expressed in the original scale, which leaves the absolute values of the observed components unaltered.

The level of smoothing of the estimated spline can be controlled by the n.knots argument. The function splineKM can assist in choosing a finer value, although the default setting works generally well.

Value

A data.frame object containing the imputed compositional data set expressed in the original scale.

References

Palarea-Albaladejo J. and Martin-Fernandez JA. zCompositions – R package for multivariate imputation of left-censored data under a compositional approach. Chemometrics and Intelligent Laboratory Systems 2015; 143: 85-96.

See Also

zPatterns, splineKM, lrEM, lrSVD, lrDA, multRepl, multLN, cmultRepl

Examples

data(Water)
data(mdl) # matrix of limits of detection for Water

Water_multKM <- multKM(Water,label=0,dl=mdl)

# Different smoothing degree by component
Water_multKM2 <- multKM(Water,label=0,dl=mdl,n.knots=c(25,50,30,75))

# Easy to use for KM multiple imputation (m = 10)
Water.mi <- vector("list",length=10)
for (m in 1:10){
  Water.mi[[m]] <- multKM(Water,label=0,dl=mdl,n.draws=1)
}


zCompositions documentation built on June 22, 2024, 9:46 a.m.