phylter | R Documentation |
Detection and filtering out of outliers in a list of trees or a list of distance matrices.
phylter(
X,
bvalue = 0,
distance = "patristic",
k = 3,
k2 = k,
Norm = "median",
Norm.cutoff = 0.001,
gene.names = NULL,
test.island = TRUE,
verbose = TRUE,
stop.criteria = 1e-05,
InitialOnly = FALSE,
normalizeby = "row",
parallel = TRUE
)
X |
A list of phylogenetic trees (phylo object) or a list of distance matrices. Trees can have different number of leaves and matrices can have different dimensions. If this is the case, missing values are imputed. |
bvalue |
If X is a list of trees, nodes with a support below 'bvalue' will be collapsed prior to the outlier detection. |
distance |
If X is a list of trees, type of distance used to compute the pairwise matrices for each tree. Can be "patristic" (sum of branch lengths separating tips, the default) or nodal (number of nodes separating tips). The "nodal" option should only be used if all species are present in all genes. |
k |
Strength of outlier detection. The higher this value the less outliers detected (see details). |
k2 |
Same as k for complete gene outlier detection. To preserve complete genes from being discarded, k2 can be increased. By default, k2 = k. (see above) |
Norm |
Should the matrices be normalized prior to the complete analysis and how. If "median" (the default), matrices are divided by their median, if "mean" they are divided by their mean, if "none", no normalization if performed. Normalizing ensures that fast-evolving (and slow-evolving) genes are not treated as outliers. Normalization by median is a better choice as it is less sensitive to outlier values. |
Norm.cutoff |
Value of the median (if |
gene.names |
List of gene names used to rename elements in X. If NULL (the default), 0 elements are named 1,2,..,length(X). |
test.island |
If TRUE (the default), only the highest value in an 'island' of outliers is considered an outlier. This prevents non-outliers hitchhiked by outliers to be considered outliers themselves. |
verbose |
If TRUE (the default), messages are written during the filtering process to get information of what is happening |
stop.criteria |
The optimisation stops when the gain in concordance between matrices between round |
InitialOnly |
Logical. If TRUE, only the Initial state of the data is computed. |
normalizeby |
Should the gene x species matrix be normalized prior to outlier detection, and how. |
parallel |
Should the computations be parallelized when possible? Default to TRUE. Note that the number of threads cannot be set by the user when 'parallel=TRUE'. It uses all available cores on the machine. |
A list of class 'phylter' with the 'Initial' (before filtering) and 'Final' (after filtering) states, or a list of class 'phylterinitial' only, if InitialOnly=TRUE. The function also returns the list of DiscardedGenes, if any.
data(carnivora)
# using default paramaters
res <- phylter(carnivora, parallel = FALSE) # perform the phylter analysis
res # brief summary of the analysis
res$DiscardedGenes # list of genes discarded prior to the analysis
res$Initial # See all elements prior to the analysis
res$Final # See all elements at the end of the analysis
res$Final$Outliers # Print all outliers detected
# Change the call to phylter to use nodal distances, instead of patristic:
res <- phylter(carnivora, distance = "nodal")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.