cladeFilter: Filter rows of a table of relative abundances

Description Usage Arguments Value Author(s)

Description

Provides several useful options for non-specific feature reduction of a bug abundance table. Function by Levi Waldron.

Usage

1
2
3
4
5
6
cladeFilter(obj, terminal.nodes.only = FALSE, clustering.reduction = FALSE, 
    cor.options = list(method = "pearson"), cutree.options = list(h = 0.1), 
    clusterSelectFun = mean, genus.or.family.only = FALSE, remove.unclassified = TRUE, 
    remove.unnamed.genus.or.higher = TRUE, required.level = "p__", 
    discretize.cutpoints = NULL, discretize.labels = NULL, min.abd = 1e-04, 
    min.samp = 0.1, asinsqrt = TRUE)

Arguments

obj

Relative abundance table with features as rows, samples as columns.

terminal.nodes.only

Keep terminal nodes only? Terminal nodes have no child nodes present in the table.

clustering.reduction

Use clustering to reduce dimensionality? Clustering is performed by cutree(hclust(as.dist(1-cor(t(obj), cor.options))), cutree.options).

cor.options

If using clustering to reduce the features, these arguments will be passed to cor()

cutree.options

If using clustering to reduce the features, these arguments will be passed to stats::cutree(). For example, the default h=0.1 will remove features with correlation > 0.9. Alternatively, k=20 could be specified to always return 20 features.

clusterSelectFun

If using clustering to reduce the features, select the feature with the maximum value of this function to select from each cluster.

genus.or.family.only

Keep only genus or family levels, nothing higher, nothing lower

remove.unclassified

Get rid of anything labelled "Unclassified" at any level

remove.unnamed.genus.or.higher

If true, remove things like |c__, |o__, |f__, |g__ - unnamed class, order, family, genus...

required.level

Keep only rows containing this string in the name, by default require at least phylum-level resolution.

discretize.cutpoints

If discretize.cutpoints is a numeric vector, then bug abundances will be discretized at these values. A sensible setting, if you want to try this, is c(0, 1e-100, 1e-4, 0.01, 0.25), with discretize.labels equal to c("zero", "very low", "low", "medium", "high").

discretize.labels

A vector of labels for discretized data, with length 1 less than the length of discretize.cutpoints. A sensible setting is discretize.cutpoints = c(0, 1e-100, 1e-4, 0.01, 0.25), discretize.labels = c("zero", "very low", "low", "medium", "high").

min.abd

Minimum abundance requirement for bugs, in at least min.samp fraction of samples

min.samp

Minimum fraction of samples with a value of min.abd.

asinsqrt

perform asin(sqrt(obj)) ?

Value

a cleaned-up version of the input matrix of bug abundances.

Author(s)

Levi Waldron and Markus Riester


lwaldron/LeviRmisc documentation built on May 21, 2019, 8:59 a.m.