enterotype: Enterotype

Description Usage Arguments Details Examples

Description

The enterotype method is introduced by http://enterotype.embl.de. Besides the original functions copied from the website, we also add few more analyses to advancing this method, such as correlation between enterotypes and given/known groups.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
enterotypePipeline(taxa.assign, n.max = 20, k = NULL, fig.path = NA,
  percent = 0.01, validate = TRUE, ...)

getRelativeAbundance(taxa.assign)

getJSD(data)

getClusters(data, data.dist, n.max = 20, ...)

plotOptClusters(nclusters, fig.path = NA, n.max = 20)

getDataCluster(data.dist, k = NULL, nclusters = NULL, as.vector = TRUE,
  validate = TRUE)

noise.removal(data, percent = 0.01)

plotEnterotypes(data, data.dist, data.cluster, attr.data = data.frame(),
  fig.path = NA, percent = 0, addLabel = TRUE, text.colour.id = NULL,
  palette = "Set1", postfix = "", plot.clusters = TRUE, plot.bca = TRUE,
  plot.pcoa = TRUE, cl.width = 9, cl.height = 6, width = 8,
  height = 8, verbose = TRUE, ...)

plotClusterAbundence(data, data.cluster, fig.path = NA,
  cluster.colours = c(), min.median = 0, x.lab = "",
  y.lab = "Relative abundence", width = 9, height = 6)

corrEnterotypeToGroup(data.dist, k, attr.data, group.id,
  simulate.p.value = TRUE)

Arguments

taxa.assign

The data frame of taxonomic assignments with abudence at the rank, where rownames are taxonomy at that rank, and columns are the sample names. It can be one element of the list generated by assignTaxaByRank.

n.max

The number of clusters. Default to 20.

k

The number of clusters chosen for the optimised solution. If NULL, the defult, it will take the number of clusters having the largest CH Index.

fig.path

The folder path to save figures. If NA, the default, do not plot the figure.

percent

The percentage threshold to remove the noise (ie. low abundant genera). Prior to the analysis. Default to 0.

validate

Logical if it needs to validate. Default to TRUE.

data

The matrix of normalized probability distributions of the abundance matrix, also called abundance distributions. Rows are taxonomy at that rank and columns are samples.

data.dist

A distance dist object calculated by Jensen-Shannon divergence (JSD) metric.

nclusters

The vector of nclusters, such as Calinski-Harabasz (CH) Index, to find the optimised solution.

as.vector

Logical if TRUE, as default, to return a vector, otherwise return a pam.object representing the clustering.

data.cluster

The clusters assigned by partitioning clustering pam.

attr.data

A data frame to provide additional attributes for visualization. Default to give an empty data frame to do nothing.

addLabel

Logical if add labels of points. Default to TRUE.

text.colour.id

The column name in attr.data to colour texts of points.

group.id

The column name in attr.data contains the known groups to compare with enterotypes.

simulate.p.value

a logical indicating whether to compute p-values by Monte Carlo simulation, the default is FALSE.

Details

enterotypePipeline is a pipeline summarised from http://enterotype.embl.de/enterotypes_tutorial.sanger.R. The steps of this pipeline are described below:

1. getRelativeAbundance turns taxa.assign into normalized probability distributions;

2. getJSD calculates Jensen-Shannon divergence dissimilarity;

3. getClusters lists Calinski-Harabasz (CH) Indices from 1 to n clusters;

4. data.cluster picks up the optimised or selected k-cluster final result;

5. plotOptClusters and plotEnterotypes make plots including BCA and PCoA.

—————————————————————————————

getRelativeAbundance returns a matrix of normalized probability distributions of the abundance matrix for an input of getJSD, namely relative abundance.

getJSD return a distance dist object calculated by Jensen-Shannon divergence (JSD) metric.

getClusters return the Calinski-Harabasz (CH) Index for choosig a number of clusters from 2 to n.max.

plotOptClusters plots the Calinski-Harabasz (CH) Index for choosig a number of clusters from 2 to n.max.

getDataCluster returns the optimised or selected solution having k clusters assigned by pam.

noise.removal removes any row whose sum is samller than the given percentage of total sum of matrix. Advise to apply this function to data generated using short sequencing technologes, like Illumina or Solid.

plotEnterotypes is a mixed function to plot clusters, BCA bca and PCoA dudi.pco from the optimised or selected solution.

Between-class analysis (BCA) was performed to support the clustering and identify the drivers for the enterotypes. It is only available when k > 2.

plotClusterAbundence return a list of ggplot2 objects for relative abundance distribution in each cluster.

corrEnterotypeToGroup calculates Cramér's V between the enterotypes and the known groups (two categorical variables) from a same data set using Chi-Squared test chisq.test. http://en.wikipedia.org/wiki/Cramér%27s_V.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
enterotypePipeline(taxa.assign, n.max=20, fig.path="./figures", percent=0)

relative.abund <- getRelativeAbundance(taxa.assign)

jsd.dist <- getJSD(relative.abund)

nclusters <- getClusters(relative.abund, jsd.dist)

plotOptClusters(nclusters, fig.path="./figures", n.max=10)

data.cluster <- getDataCluster(jsd.dist, nclusters=nclusters)

data=noise.removal(data, percent=0.1)

plotEnterotypes(relative.abund, jsd.dist, data.cluster)

p.list <- plotClusterAbundence(data, data.cluster)

chi <- corrEnterotypeToGroup(jsd.dist, k=5, attr.data=env, group.id="land.use")

walterxie/ComMA documentation built on May 3, 2019, 11:51 p.m.