PERFect_sim: Simulation PERFect filtering for microbiome data

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/PERFect_sim.R

Description

Simultaneous filtering of the provided OTU table X at a test level alpha. One distribution is fit to taxa simultaneously.

Usage

1
2
3
4
5
PERFect_sim(X,infocol = NULL, Order = "NP", Order.user = NULL, normalize = "counts",
         center = FALSE, quant = c(0.1, 0.25, 0.5), distr = "sn",
         alpha = 0.1, rollmean = TRUE, direction = "left", pvals_sim = NULL,
         nbins = 30, col = "red", fill = "green", hist_fill = 0.2,
         linecol = "blue")

Arguments

X

OTU table, where taxa are columns and samples are rows of the table. It should be a in data frame format with columns corresponding to taxa names. It could contains columns of metadata.

infocol

Index vector of the metadata. We assume user only gives a taxa table, but if the metadata of the samples are included in the columns of the input, this option needs to be specified.

Order

Taxa ordering. The default ordering is the number of occurrences (NP) of the taxa in all samples. Other types of order are p-value ordering, number of connected taxa and weighted number of connected taxa, denoted as "pvals", "NC", "NCw" respectively. More details about taxa ordering are described in Smirnova et al. User can also specify their preference order with Order.user.

Order.user

User's taxa ordering. This argument takes a character vector of ordered taxa names.

normalize

Normalizing taxa count. The default option does not normalize taxa count, but user can convert the OTU table into a proportion table using the option "prop" or convert it into a presence/absence table using "pres".

center

Centering OTU table. The default option does not center the OTU table.

quant

Quantile values used to fit the distribution to log DFL values. The number of quantile values corresponds to the number of parameters in the distribution the data is fitted to. Assuming that at least 50% of taxa are not informative, we suggest fitting the log Skew-Normal distribution by matching the 10%, 25% and 50% percentiles of the log-transformed samples to the Skew-Normal distribution.

distr

The type of distribution to fit log DFL values to. While we suggest using Skew-Normal distribution, and set as the default distribution, other choices are available.

"sn"

Skew-Normal distribution with 3 parameters: location xi, scale omega^2 and shape alpha

"norm"

Normal distribution with 2 parameters: mean and standard deviation sd

"t"

Student t-distribution with 2 parameters: n degrees of freedom and noncentrality ncp

"cauchy"

Cauchy distribution with 2 parameters: location and scale

alpha

Test level alpha, set to 0.1 by default.

rollmean

Binary TRUE/FALSE value. If TRUE, rolling average (moving mean) of p-values will be calculated, with the lag window set to 3 by default.

direction

Character specifying whether the index of the result should be left- or right-aligned or centered compared to the rolling window of observations, set to "left" by default.

pvals_sim

Object resulting from simultaneous PERFect with taxa abundance ordering, allowing user to perform Simultaneous PERFect with p-values ordering. Be aware that the choice of distribution for both methods must be the same.

nbins

Number of bins used to visualize the histogram of log DFL values, set to 30 by default.

col

Graphical parameter for color of histogram bars border, set to "red" by default.

fill

Graphical parameter for color of histogram fill, set to "green" by default.

hist_fill

Graphical parameter for intensity of histogram fill, set to 0.2 by default.

linecol

Graphical parameter for the color of the fitted distribution density, set to "blue" by default.

Details

Filtering is the process of identifying and removing a subset of taxa according to a particular criterion. Function PERFect_sim() filters the provided OTU table X and outputs a filtered table that contains signal taxa. PERFect_sim() calculates differences in filtering loss DFL for each taxon according to the given taxa order. By default, the function fits Skew-Normal distribution to the log-differences in filtering loss but Normal, t, or Cauchy distributions can be also used. This is implementation of Algorithm 1 described in Smirnova et al.

Value

A list is returned containing:

filtX

Filtered OTU table.

pvals

P-values of the test.

DFL

Differences in filtering loss values.

fit

Fitted values and further goodness of fit details passed from the fitdistr() function.

hist

Histogram of log differences in filtering loss.

est

Estimated distribution parameters.

pDFL

Plot of differences in filtering loss values.

Author(s)

Ekaterina Smirnova

References

Azzalini, A. (2005). The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics, 32(2), 159-188.

Smirnova, E., Huzurbazar, H., Jafari, F. “PERFect: permutationfiltration of microbiome data", to be submitted.

See Also

PERFect_perm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
data(mock2)
# Proportion data matrix
Prop <- mock2$Prop

# Counts data matrix
Counts <- mock2$Counts
dim(Counts) # 240x46

# Perform simultaenous filtering of the data
res_sim <- PERFect_sim(X=Counts)
dim(res_sim$filtX)      # 240x10, removing 36 taxa
colnames(res_sim$filtX) # signal taxa

#permutation perfect colored by FLu values
pvals_Plots(PERFect = res_sim, X = Counts, quantiles = c(0.25, 0.5, 0.8, 0.9), alpha=0.05)

katiasmirn/PERFect documentation built on Sept. 17, 2019, 11:54 a.m.