multKM | R Documentation |
This function implements non-parametric multiplicative KMSS imputation of left-censored values (e.g. values below detection limit, rounded zeros) in compositional data sets. It is based on simulation from a smoothing spline fitted to the Kaplan-Meier (KM) estimate of the empirical cumulative distribution function (ECDF) of the data.
multKM(X, label = NULL, dl = NULL, n.draws = 1000, n.knots = NULL,
z.warning = 0.8, z.delete = TRUE)
X |
Compositional data set ( |
label |
Unique label ( |
dl |
Numeric vector or matrix of detection limits/thresholds. These must be given on the same scale as |
n.draws |
Number of random draws from the inverse KM ECDF generated to produce an averaged imputed value ( |
n.knots |
Integer or function giving the number of knots used for fitting a cubic smoothing spline to the KM ECDF (see |
z.warning |
Threshold used to identify individual rows or columns including an excess of zeros/unobserved values (to be specify in proportions, default |
z.delete |
Logical value. If set to |
This function imputes left-censored compositional values by averaging (geometric mean) n random draws (n.draws
argument) from a cubic smoothing spline curve fitting the inverse KM ECDF below the corresponding limit of detection or censoring threshold. It then applies a multiplicative adjustment to preserve the multivariate compositional properties of the samples. It allows for either single (vector
form) or multiple (matrix
form, same size as X
) limits of detection by component. Although note that it is equivalent to simple substitution by the limit of detection for singly censored components. Any threshold value can be set for non-censored elements (e.g. use 0 if no threshold for a particular column or element of the data matrix).
It produces an imputed data set on the same scale as the input data set. If X
is not closed to a constant sum, then the results are adjusted to provide a compositionally equivalent data set, expressed in the original scale, which leaves the absolute values of the observed components unaltered.
The level of smoothing of the estimated spline can be controlled by the n.knots
argument. The function splineKM
can assist in choosing a finer value, although the default setting works generally well.
A data.frame
object containing the imputed compositional data set expressed in the original scale.
Palarea-Albaladejo J. and Martin-Fernandez JA. zCompositions – R package for multivariate imputation of left-censored data under a compositional approach. Chemometrics and Intelligent Laboratory Systems 2015; 143: 85-96.
zPatterns
, splineKM
, lrEM
, lrSVD
, lrDA
, multRepl
, multLN
, cmultRepl
data(Water)
data(mdl) # matrix of limits of detection for Water
Water_multKM <- multKM(Water,label=0,dl=mdl)
# Different smoothing degree by component
Water_multKM2 <- multKM(Water,label=0,dl=mdl,n.knots=c(25,50,30,75))
# Easy to use for KM multiple imputation (m = 10)
Water.mi <- vector("list",length=10)
for (m in 1:10){
Water.mi[[m]] <- multKM(Water,label=0,dl=mdl,n.draws=1)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.