Description Usage Arguments Details Examples
The enterotype method is introduced by http://enterotype.embl.de. Besides the original functions copied from the website, we also add few more analyses to advancing this method, such as correlation between enterotypes and given/known groups.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | enterotypePipeline(taxa.assign, n.max = 20, k = NULL, fig.path = NA,
percent = 0.01, validate = TRUE, ...)
getRelativeAbundance(taxa.assign)
getJSD(data)
getClusters(data, data.dist, n.max = 20, ...)
plotOptClusters(nclusters, fig.path = NA, n.max = 20)
getDataCluster(data.dist, k = NULL, nclusters = NULL, as.vector = TRUE,
validate = TRUE)
noise.removal(data, percent = 0.01)
plotEnterotypes(data, data.dist, data.cluster, attr.data = data.frame(),
fig.path = NA, percent = 0, addLabel = TRUE, text.colour.id = NULL,
palette = "Set1", postfix = "", plot.clusters = TRUE, plot.bca = TRUE,
plot.pcoa = TRUE, cl.width = 9, cl.height = 6, width = 8,
height = 8, verbose = TRUE, ...)
plotClusterAbundence(data, data.cluster, fig.path = NA,
cluster.colours = c(), min.median = 0, x.lab = "",
y.lab = "Relative abundence", width = 9, height = 6)
corrEnterotypeToGroup(data.dist, k, attr.data, group.id,
simulate.p.value = TRUE)
|
taxa.assign |
The data frame of taxonomic assignments with abudence
at the |
n.max |
The number of clusters. Default to 20. |
k |
The number of clusters chosen for the optimised solution. If NULL, the defult, it will take the number of clusters having the largest CH Index. |
fig.path |
The folder path to save figures. If NA, the default, do not plot the figure. |
percent |
The percentage threshold to remove the noise (ie. low abundant genera). Prior to the analysis. Default to 0. |
validate |
Logical if it needs to validate. Default to TRUE. |
data |
The matrix of normalized probability distributions of the abundance matrix, also called abundance distributions. Rows are taxonomy at that rank and columns are samples. |
data.dist |
A distance |
nclusters |
The vector of nclusters, such as Calinski-Harabasz (CH) Index, to find the optimised solution. |
as.vector |
Logical if TRUE, as default, to return a vector,
otherwise return a |
data.cluster |
The clusters assigned by
partitioning clustering |
attr.data |
A data frame to provide additional attributes for visualization. Default to give an empty data frame to do nothing. |
addLabel |
Logical if add labels of points. Default to TRUE. |
text.colour.id |
The column name in |
group.id |
The column name in |
simulate.p.value |
a logical indicating whether to compute p-values by Monte Carlo simulation, the default is FALSE. |
enterotypePipeline
is a pipeline summarised from
http://enterotype.embl.de/enterotypes_tutorial.sanger.R.
The steps of this pipeline are described below:
1. getRelativeAbundance
turns taxa.assign into normalized probability distributions;
2. getJSD
calculates Jensen-Shannon divergence dissimilarity;
3. getClusters
lists Calinski-Harabasz (CH) Indices from 1 to n clusters;
4. data.cluster
picks up the optimised or selected k-cluster final result;
5. plotOptClusters
and plotEnterotypes
make plots including BCA and PCoA.
—————————————————————————————
getRelativeAbundance
returns a matrix of normalized probability distributions
of the abundance matrix for an input of getJSD
, namely relative abundance.
getJSD
return a distance dist
object
calculated by Jensen-Shannon divergence (JSD) metric.
getClusters
return the Calinski-Harabasz (CH) Index
for choosig a number of clusters from 2 to n.max
.
plotOptClusters
plots the Calinski-Harabasz (CH) Index
for choosig a number of clusters from 2 to n.max
.
getDataCluster
returns the optimised or selected solution having
k
clusters assigned by pam
.
noise.removal
removes any row whose sum is samller than
the given percentage of total sum of matrix.
Advise to apply this function to data generated
using short sequencing technologes, like Illumina or Solid.
plotEnterotypes
is a mixed function to plot clusters,
BCA bca
and PCoA dudi.pco
from the optimised or selected solution.
Between-class analysis (BCA) was performed to support the
clustering and identify the drivers for the enterotypes.
It is only available when k
> 2.
plotClusterAbundence
return a list of ggplot2
objects
for relative abundance distribution in each cluster.
corrEnterotypeToGroup
calculates Cramér's V between the enterotypes
and the known groups (two categorical variables) from a same data set
using Chi-Squared test chisq.test
.
http://en.wikipedia.org/wiki/Cramér%27s_V.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | enterotypePipeline(taxa.assign, n.max=20, fig.path="./figures", percent=0)
relative.abund <- getRelativeAbundance(taxa.assign)
jsd.dist <- getJSD(relative.abund)
nclusters <- getClusters(relative.abund, jsd.dist)
plotOptClusters(nclusters, fig.path="./figures", n.max=10)
data.cluster <- getDataCluster(jsd.dist, nclusters=nclusters)
data=noise.removal(data, percent=0.1)
plotEnterotypes(relative.abund, jsd.dist, data.cluster)
p.list <- plotClusterAbundence(data, data.cluster)
chi <- corrEnterotypeToGroup(jsd.dist, k=5, attr.data=env, group.id="land.use")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.