PERFect_perm: Permutation PERFect filtering for microbiome data

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/PERFect_perm.R

Description

Permutation filtering of the provided OTU table X at a test level alpha. Each set of j taxa significance is evaluated by fitting the Skew-Normal, Normal, t or Cauchy distribution to the sampling distribution obtained by permuted taxa labels.

Usage

1
2
3
4
5
PERFect_perm(X, infocol = NULL, Order = "NP", Order.user = NULL, normalize = "counts",
    algorithm = "fast", center = FALSE, quant = c(0.1, 0.25, 0.5),
    distr = "sn", alpha = 0.1, rollmean = TRUE, direction = "left", pvals_sim = NULL,
    k = 10000, nbins = 30, hist = TRUE, col = "red", fill = "green",
    hist_fill = 0.2, linecol = "blue")

Arguments

X

OTU table, where taxa are columns and samples are rows of the table. It should be a in data frame format with columns corresponding to taxa names.

infocol

Index vector of the metadata. We assume user only gives a taxa table, but if the metadata of the samples are included in the columns of the input, this option needs to be specified.

Order

Taxa ordering. The default ordering is the number of occurrences (NP) of the taxa in all samples. Other types of order are p-value ordering, number of connected taxa and weighted number of connected taxa, denoted as "pvals", "NC", "NCw" respectively. More details about taxa ordering are described in Smirnova et al. User can also specify their preference order with Order.user.

Order.user

User's taxa ordering. This argument takes a character vector of ordered taxa names.

normalize

Normalizing taxa count. The default option does not normalize taxa count, but user can convert the OTU table into a proportion table using the option "prop" or convert it into a presence/absence table using "pres".

algorithm

Algorithm speed. The default is speed is "fast", which allows the program to efficiently search for significant taxa without computing all the p-values. User must use the default option "hist = FALSE" for the fast algorithm. The alternative setting is "full", which computes all the taxa's p-values.

center

Centering OTU table. The default option does not center the OTU table.

quant

Quantile values used to fit the distribution to log DFL values. The number of quantile values corresponds to the number of parameters in the distribution the data is fitted to. Assuming that at least 50% of taxa are not informative, we suggest fitting the log Skew-Normal distribution by matching the 10%, 25% and 50% percentiles of the log-transformed samples to the Skew-Normal distribution.

distr

The type of distribution to fit log DFL values to. While we suggest using Skew-Normal distribution, and set as the default distribution, other choices are available.

"sn"

Skew-Normal distribution with 3 parameters: location xi, scale omega^2 and shape alpha

"norm"

Normal distribution with 2 parameters: mean and standard deviation sd

alpha

Test level alpha, set to 0.1 by default.

rollmean

Binary TRUE/FALSE value. If TRUE, rolling average (moving mean) of p-values will be calculated, with the lag window set to 3 by default.

direction

Character specifying whether the index of the result should be left- or right-aligned or centered compared to the rolling window of observations, set to "left" by default.

pvals_sim

Object resulting from simultaneous PERFect with taxa abundance ordering, allowing user to perform Simultaneous PERFect with p-values ordering. Be aware that the choice of distribution for both methods must be the same.

k

The number of permutations, set to 10000 by default.

nbins

Number of bins used to visualize the histogram of log DFL values, set to 30 by default.

hist

Binary TRUE/FALSE value. If TRUE, the function builds histograms for each taxon.

col

Graphical parameter for color of histogram bars border, set to "red" by default.

fill

Graphical parameter for color of histogram fill, set to "green" by default.

hist_fill

Graphical parameter for intensity of histogram fill, set to 0.2 by default.

linecol

Graphical parameter for the color of the fitted distribution density, set to "blue" by default.

Details

Filtering is the process of identifying and removing a subset of taxa according to a particular criterion. As opposed to the the simultaneous filtering approach, we do not assume that all distributions for each set of taxa are identical and equal to the distribution of simultaneous filtering. Function PERFect_perm() filters the provided OTU table X and outputs a filtered table that contains signal taxa. PERFect_perm() calculates differences in filtering loss DFL for each taxon according to the given taxa order. By default, the function fits Skew-Normal distribution to the log-differences in filtering loss but Normal, t, or Cauchy distributions can be also used.

Value

If "algorithm = full" is chosen, a list is returned containing:

filtX

Filtered OTU table.

pvals

P-values of the test.

DFL

Differences in filtering loss values.

fit

Fitted values and further goodness of fit details passed from the fitdistr() function.

hist

Histogram of log differences in filtering loss.

est

Estimated distribution parameters.

dfl_distr

Plot of differences in filtering loss values.

If "algorithm = fast" is chosen, fit, hist, est, dfl_distr will not be returned.

Author(s)

Ekaterina Smirnova

References

Azzalini, A. (2005). The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics, 32(2), 159-188.

Smirnova, E., Huzurbazar, H., Jafari, F. “PERFect: permutationfiltration of microbiome data", to be submitted.

See Also

PERFect_sim

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
data(mock2)

# Proportion data matrix
Prop <- mock2$Prop

# Counts data matrix
Counts <- mock2$Counts

# Perform simultaenous filtering of the data
res_sim <- PERFect_sim(X=Counts)

#order according to p-values
pvals_sim <- pvals_Order(Counts, res_sim)

#### Uncomment to run algorithm with parallel processing ith more than 2 cores
# #obtain permutation PERFEct results using NP taxa ordering
# res_perm <- PERFect_perm(X = Prop, Order.user = pvals_sim, algorithm = "fast")

# #permutation perfect colored by FLu values
# pvals_Plots(PERFect = res_perm, X = Counts, quantiles = c(0.25, 0.5, 0.8, 0.9), alpha=0.05)

katiasmirn/PERFect documentation built on Sept. 17, 2019, 11:54 a.m.