cmultRepl | R Documentation |
This function implements methods for imputing zeros in compositional count data sets based on a Bayesian-multiplicative replacement.
cmultRepl(X, label = 0,
method = c("GBM","SQ","BL","CZM","user"), output = c("prop","p-counts"),
frac = 0.65, threshold = 0.5, adjust = TRUE, t = NULL, s = NULL,
z.warning = 0.8, z.delete = TRUE, suppress.print = FALSE,
delta = NULL)
X |
Count data set ( |
label |
Unique label ( |
method |
Geometric Bayesian multiplicative ( |
output |
Output format: imputed proportions ( |
frac |
If |
threshold |
For a vector of counts, factor applied to the quotient 1 over the number of trials (sum of the counts) used to produce an upper limit for replacing zero counts by the |
adjust |
Logical vector setting whether imputed proportions falling above the lowest estimated probability for a multinomial part must be adjusted or not (default |
t |
If |
s |
If |
z.warning |
Threshold used to identify individual rows or columns including an excess of zeros/unobserved values (to be specify in proportions, default |
z.delete |
Logical value. If set to |
suppress.print |
Suppress printed feedback ( |
delta |
This argument has been deprecated and replaced by |
Zero counts, assumed to be due to under-reporting or limited sampling, are imputed under a Bayesian paradigm (GBM
, SQ
or BL
method) by posterior estimates of the multinomial probabilities generating the counts, assuming a Dirichlet prior distribution. The argument method
sets the Dirichlet hyper-parameters t
(priori estimates of multinomial probabilities) and s
(strength). The user can specify their own by setting method="user"
and entering them as t
and s
arguments. Note that, under certain circumstances (see references for details), these methods can generate imputed proportions falling above the lowest estimated probability of a multinomial part (c/n, where c is the count and n is the number of trials). In such cases, the imputation is adjusted by using a fraction (frac
) of the minimum c/n for that part. Lastly, the non-zero parts are multiplicatively adjusted according to their compositional nature.
On the other hand, method="CZM"
uses multiplicative simple replacement (multRepl
) on the matrix of estimated probabilities. The upper limit and the fraction used are specified by, respectively, the arguments threshold
and frac
. Suggested values are threshold=0.5
(so the upper limit for a multinomial probability turns out to be 0.5/n), and frac=0.65
(so the imputed proportion is 65% of the upper limit).
By default (output="prop"
) the function returns an imputed data set (data.frame
class) in proportions (estimated probabilities). Alternatively, these proportions are re-scaled to produce a compositionally-equivalent matrix of pseudo-counts (output="p-counts"
) which preserves the ratios between parts.
When adjust=TRUE
and verbose=TRUE
, the number of times, if any, an imputed proportion was adjusted to fall below the minimum estimated multinomial probability is printed.
Martin-Fernandez, J.A., Hron, K., Templ, M., Filzmoser, P., Palarea-Albaladejo, J. Bayesian-multiplicative treatment of count zeros in compositional data sets. Statistical Modelling 2015; 15: 134-158.
Palarea-Albaladejo J. and Martin-Fernandez JA. zCompositions – R package for multivariate imputation of left-censored data under a compositional approach. Chemometrics and Intelligence Laboratory Systems 2015; 143: 85-96.
zPatterns
data(Pigs)
# GBM method and matrix of estimated probabilities
Pigs.GBM <- cmultRepl(Pigs)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.