PERFect_sim: Simulation PERFect filtering for microbiome data
In katiasmirn/PERFect: Permutation filtration for microbiome data

Description Usage Arguments Details Value Author(s) References See Also Examples

Simultaneous filtering of the provided OTU table X at a test level alpha. One distribution is fit to taxa simultaneously.

PERFect_sim(X,infocol = NULL, Order = "NP", Order.user = NULL, normalize = "counts",
         center = FALSE, quant = c(0.1, 0.25, 0.5), distr = "sn",
         alpha = 0.1, rollmean = TRUE, direction = "left", pvals_sim = NULL,
         nbins = 30, col = "red", fill = "green", hist_fill = 0.2,
         linecol = "blue")

`X`	OTU table, where taxa are columns and samples are rows of the table. It should be a in data frame format with columns corresponding to taxa names. It could contains columns of metadata.
`infocol`	Index vector of the metadata. We assume user only gives a taxa table, but if the metadata of the samples are included in the columns of the input, this option needs to be specified.
`Order`	Taxa ordering. The default ordering is the number of occurrences (NP) of the taxa in all samples. Other types of order are p-value ordering, number of connected taxa and weighted number of connected taxa, denoted as `"pvals"`, `"NC"`, `"NCw"` respectively. More details about taxa ordering are described in Smirnova et al. User can also specify their preference order with Order.user.
`Order.user`	User's taxa ordering. This argument takes a character vector of ordered taxa names.
`normalize`	Normalizing taxa count. The default option does not normalize taxa count, but user can convert the OTU table into a proportion table using the option `"prop"` or convert it into a presence/absence table using `"pres"`.
`center`	Centering OTU table. The default option does not center the OTU table.
`quant`	Quantile values used to fit the distribution to log DFL values. The number of quantile values corresponds to the number of parameters in the distribution the data is fitted to. Assuming that at least 50% of taxa are not informative, we suggest fitting the log Skew-Normal distribution by matching the 10%, 25% and 50% percentiles of the log-transformed samples to the Skew-Normal distribution.
`distr`	The type of distribution to fit log DFL values to. While we suggest using Skew-Normal distribution, and set as the default distribution, other choices are available. `"sn"` Skew-Normal distribution with 3 parameters: location xi, scale omega^2 and shape alpha `"norm"` Normal distribution with 2 parameters: mean and standard deviation sd `"t"` Student t-distribution with 2 parameters: n degrees of freedom and noncentrality ncp `"cauchy"` Cauchy distribution with 2 parameters: location and scale
`alpha`	Test level alpha, set to 0.1 by default.
`rollmean`	Binary TRUE/FALSE value. If TRUE, rolling average (moving mean) of p-values will be calculated, with the lag window set to 3 by default.
`direction`	Character specifying whether the index of the result should be left- or right-aligned or centered compared to the rolling window of observations, set to "left" by default.
`pvals_sim`	Object resulting from simultaneous PERFect with taxa abundance ordering, allowing user to perform Simultaneous PERFect with p-values ordering. Be aware that the choice of distribution for both methods must be the same.
`nbins`	Number of bins used to visualize the histogram of log DFL values, set to 30 by default.
`col`	Graphical parameter for color of histogram bars border, set to "red" by default.
`fill`	Graphical parameter for color of histogram fill, set to "green" by default.
`hist_fill`	Graphical parameter for intensity of histogram fill, set to 0.2 by default.
`linecol`	Graphical parameter for the color of the fitted distribution density, set to "blue" by default.

Filtering is the process of identifying and removing a subset of taxa according to a particular criterion. Function PERFect_sim() filters the provided OTU table X and outputs a filtered table that contains signal taxa. PERFect_sim() calculates differences in filtering loss DFL for each taxon according to the given taxa order. By default, the function fits Skew-Normal distribution to the log-differences in filtering loss but Normal, t, or Cauchy distributions can be also used. This is implementation of Algorithm 1 described in Smirnova et al.

A list is returned containing:

`filtX`	Filtered OTU table.
`pvals`	P-values of the test.
`DFL`	Differences in filtering loss values.
`fit`	Fitted values and further goodness of fit details passed from the `fitdistr()` function.
`hist`	Histogram of log differences in filtering loss.
`est`	Estimated distribution parameters.
`pDFL`	Plot of differences in filtering loss values.

Ekaterina Smirnova

Azzalini, A. (2005). The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics, 32(2), 159-188.

Smirnova, E., Huzurbazar, H., Jafari, F. “PERFect: permutationfiltration of microbiome data", to be submitted.

PERFect_perm

data(mock2)
# Proportion data matrix
Prop <- mock2$Prop

# Counts data matrix
Counts <- mock2$Counts
dim(Counts) # 240x46

# Perform simultaenous filtering of the data
res_sim <- PERFect_sim(X=Counts)
dim(res_sim$filtX)      # 240x10, removing 36 taxa
colnames(res_sim$filtX) # signal taxa

#permutation perfect colored by FLu values
pvals_Plots(PERFect = res_sim, X = Counts, quantiles = c(0.25, 0.5, 0.8, 0.9), alpha=0.05)