Description Usage Arguments Details Value Author(s) References See Also Examples
Permutation filtering of the provided OTU table X at a test level alpha. Each set of j taxa significance is evaluated by fitting the Skew-Normal, Normal, t or Cauchy distribution to the sampling distribution obtained by permuted taxa labels.
1 2 3 4 5 | PERFect_perm(X, infocol = NULL, Order = "NP", Order.user = NULL, normalize = "counts",
algorithm = "fast", center = FALSE, quant = c(0.1, 0.25, 0.5),
distr = "sn", alpha = 0.1, rollmean = TRUE, direction = "left", pvals_sim = NULL,
k = 10000, nbins = 30, hist = TRUE, col = "red", fill = "green",
hist_fill = 0.2, linecol = "blue")
|
X |
OTU table, where taxa are columns and samples are rows of the table. It should be a in data frame format with columns corresponding to taxa names. |
infocol |
Index vector of the metadata. We assume user only gives a taxa table, but if the metadata of the samples are included in the columns of the input, this option needs to be specified. |
Order |
Taxa ordering. The default ordering is the number of occurrences (NP) of the taxa in all samples.
Other types of order are p-value ordering, number of connected taxa and weighted number of connected taxa,
denoted as |
Order.user |
User's taxa ordering. This argument takes a character vector of ordered taxa names. |
normalize |
Normalizing taxa count. The default option does not normalize taxa count,
but user can convert the OTU table into a proportion table using the option |
algorithm |
Algorithm speed. The default is speed is |
center |
Centering OTU table. The default option does not center the OTU table. |
quant |
Quantile values used to fit the distribution to log DFL values. The number of quantile values corresponds to the number of parameters in the distribution the data is fitted to. Assuming that at least 50% of taxa are not informative, we suggest fitting the log Skew-Normal distribution by matching the 10%, 25% and 50% percentiles of the log-transformed samples to the Skew-Normal distribution. |
distr |
The type of distribution to fit log DFL values to. While we suggest using Skew-Normal distribution, and set as the default distribution, other choices are available.
|
alpha |
Test level alpha, set to 0.1 by default. |
rollmean |
Binary TRUE/FALSE value. If TRUE, rolling average (moving mean) of p-values will be calculated, with the lag window set to 3 by default. |
direction |
Character specifying whether the index of the result should be left- or right-aligned or centered compared to the rolling window of observations, set to "left" by default. |
pvals_sim |
Object resulting from simultaneous PERFect with taxa abundance ordering, allowing user to perform Simultaneous PERFect with p-values ordering. Be aware that the choice of distribution for both methods must be the same. |
k |
The number of permutations, set to 10000 by default. |
nbins |
Number of bins used to visualize the histogram of log DFL values, set to 30 by default. |
hist |
Binary TRUE/FALSE value. If TRUE, the function builds histograms for each taxon. |
col |
Graphical parameter for color of histogram bars border, set to "red" by default. |
fill |
Graphical parameter for color of histogram fill, set to "green" by default. |
hist_fill |
Graphical parameter for intensity of histogram fill, set to 0.2 by default. |
linecol |
Graphical parameter for the color of the fitted distribution density, set to "blue" by default. |
Filtering is the process of identifying and removing a subset of taxa according to a particular criterion. As opposed to the the simultaneous filtering approach, we do not assume that all distributions for each set of taxa are identical and equal to the distribution of simultaneous filtering. Function PERFect_perm() filters the provided OTU table X and outputs a filtered table that contains signal taxa. PERFect_perm() calculates differences in filtering loss DFL for each taxon according to the given taxa order. By default, the function fits Skew-Normal distribution to the log-differences in filtering loss but Normal, t, or Cauchy distributions can be also used.
If "algorithm = full"
is chosen, a list is returned containing:
filtX |
Filtered OTU table. |
pvals |
P-values of the test. |
DFL |
Differences in filtering loss values. |
fit |
Fitted values and further goodness of fit details passed from the |
hist |
Histogram of log differences in filtering loss. |
est |
Estimated distribution parameters. |
dfl_distr |
Plot of differences in filtering loss values. |
If "algorithm = fast"
is chosen, fit
, hist
, est
, dfl_distr
will not be returned.
Ekaterina Smirnova
Azzalini, A. (2005). The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics, 32(2), 159-188.
Smirnova, E., Huzurbazar, H., Jafari, F. “PERFect: permutationfiltration of microbiome data", to be submitted.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | data(mock2)
# Proportion data matrix
Prop <- mock2$Prop
# Counts data matrix
Counts <- mock2$Counts
# Perform simultaenous filtering of the data
res_sim <- PERFect_sim(X=Counts)
#order according to p-values
pvals_sim <- pvals_Order(Counts, res_sim)
#### Uncomment to run algorithm with parallel processing ith more than 2 cores
# #obtain permutation PERFEct results using NP taxa ordering
# res_perm <- PERFect_perm(X = Prop, Order.user = pvals_sim, algorithm = "fast")
# #permutation perfect colored by FLu values
# pvals_Plots(PERFect = res_perm, X = Counts, quantiles = c(0.25, 0.5, 0.8, 0.9), alpha=0.05)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.