cmultRepl: Bayesian-Multiplicative replacement of count zeros

View source: R/cmultRepl.R

cmultReplR Documentation

Bayesian-Multiplicative replacement of count zeros

Description

This function implements methods for imputing zeros in compositional count data sets based on a Bayesian-multiplicative replacement.

Usage

cmultRepl(X, label = 0,
             method = c("GBM","SQ","BL","CZM","user"), output = c("prop","p-counts"),
             frac = 0.65, threshold = 0.5, adjust = TRUE, t = NULL, s = NULL,
             z.warning = 0.8, z.delete = TRUE, suppress.print = FALSE,
             delta = NULL)

Arguments

X

Count data set (matrix or data.frame class).

label

Unique label (numeric or character) used to denote count zeros in X (default label=0).

method

Geometric Bayesian multiplicative (GBM, default); square root BM (SQ); Bayes-Laplace BM (BL); count zero multiplicative (CZM); user-specified hyper-parameters (user).

output

Output format: imputed proportions (prop, default) or pseudo-counts (p-counts).

frac

If method="CZM", fraction of the upper threshold used to impute zeros (default frac=0.65). Also, fraction of the lowest estimated probability used to adjust imputed proportions falling above it (when adjust=TRUE).

threshold

For a vector of counts, factor applied to the quotient 1 over the number of trials (sum of the counts) used to produce an upper limit for replacing zero counts by the CZM method (default threshold=0.5).

adjust

Logical vector setting whether imputed proportions falling above the lowest estimated probability for a multinomial part must be adjusted or not (default adjust=TRUE).

t

If method="user", user-specified t hyper-parameter of the Dirichlet prior distribution for each count vector (row) in X. It must be a matrix of the same dimensions as X.

s

If method="user", user-specified s hyper-parameter of the Dirichlet prior distribution for each count vector (row) in X. It must be a vector of length equal to the number of rows of X.

z.warning

Threshold used to identify individual rows or columns including an excess of zeros/unobserved values (to be specify in proportions, default z.warning=0.8).

z.delete

Logical value. If set to TRUE, rows/columns identified by z.warning are omitted in the imputed data set. Otherwise, the function stops in error when rows/columns are identified by z.warning (default z.delete=TRUE).

suppress.print

Suppress printed feedback (suppress.print=FALSE, default).

delta

This argument has been deprecated and replaced by frac (see package's NEWS for details).

Details

Zero counts, assumed to be due to under-reporting or limited sampling, are imputed under a Bayesian paradigm (GBM, SQ or BL method) by posterior estimates of the multinomial probabilities generating the counts, assuming a Dirichlet prior distribution. The argument method sets the Dirichlet hyper-parameters t (priori estimates of multinomial probabilities) and s (strength). The user can specify their own by setting method="user" and entering them as t and s arguments. Note that, under certain circumstances (see references for details), these methods can generate imputed proportions falling above the lowest estimated probability of a multinomial part (c/n, where c is the count and n is the number of trials). In such cases, the imputation is adjusted by using a fraction (frac) of the minimum c/n for that part. Lastly, the non-zero parts are multiplicatively adjusted according to their compositional nature.

On the other hand, method="CZM" uses multiplicative simple replacement (multRepl) on the matrix of estimated probabilities. The upper limit and the fraction used are specified by, respectively, the arguments threshold and frac. Suggested values are threshold=0.5 (so the upper limit for a multinomial probability turns out to be 0.5/n), and frac=0.65 (so the imputed proportion is 65% of the upper limit).

Value

By default (output="prop") the function returns an imputed data set (data.frame class) in proportions (estimated probabilities). Alternatively, these proportions are re-scaled to produce a compositionally-equivalent matrix of pseudo-counts (output="p-counts") which preserves the ratios between parts.

When adjust=TRUE and verbose=TRUE, the number of times, if any, an imputed proportion was adjusted to fall below the minimum estimated multinomial probability is printed.

References

Martin-Fernandez, J.A., Hron, K., Templ, M., Filzmoser, P., Palarea-Albaladejo, J. Bayesian-multiplicative treatment of count zeros in compositional data sets. Statistical Modelling 2015; 15: 134-158.

Palarea-Albaladejo J. and Martin-Fernandez JA. zCompositions – R package for multivariate imputation of left-censored data under a compositional approach. Chemometrics and Intelligence Laboratory Systems 2015; 143: 85-96.

See Also

zPatterns

Examples

data(Pigs)

# GBM method and matrix of estimated probabilities
Pigs.GBM <- cmultRepl(Pigs)


zCompositions documentation built on June 22, 2024, 9:46 a.m.