GraphMetagen: Graphical model construction for metagenomic data

Description Usage Arguments Value Author(s) References See Also Examples

Description

Given the matrix consisting of metagenomic counts or measure of dissimilarity between samples, multidimensional scaling models are constructed to visualize the samples in the Euclidean space. Clustering methods such as PAM are used to classify the samples into various clusters to study similarity amongst samples.

Usage

1
GraphMetagen(MDSdata)

Arguments

MDSdata

A list containing the following items

  • Contents - A matrix consisting of the metagenomic counts or dissimilarity matrix.

  • Clust - (Optional) Vector comprising the cluster memberships of samples determined using other data attributes.

  • DataType - Determines whether the values in Contents are counts or distances. Takes values in c(“Counts”, “Distance”).

  • DistType - Determines the measure of dissimilarity to be used when Contents contains counts. Takes values in c(“Kendall's tau-distance”, “UniFrac”).

  • PhyTree - A phylogenetic tree of class phylo. To be provided when DistType is set equal to “UniFrac”.

  • GUnifType - Type of generalized UniFrac distance to be calculated. Takes values in c(“Unweighted”, “Variance Adjusted”, “Generalized”)

  • GUnifWeight - The weight parameter used in calculation of Generalized UniFrac distance. The parameter takes values between 0 and 1. For more details, see Chen et.al.(2012).

  • Dimensions - Integer valued variable which determines the dimensions for which multidimensional scaling models should be constructed.

  • Norms - The norm to be used for construction of metric multidimensional scaling models. Takes positive integer values.

  • Penalty - A positive number between zero and one which determines the penalty for ties when calculating Kendall's τ-distance.

  • MinClust - Minimum number of clusters to be used for estimating the optimal number of clusters. Takes values greater than 2(default value).

Value

Name

Name of the model constructed.

Coords

A N \times p matrix containing the coordinates of the samples obtained by MDS methods, where N is the number of samples and p is the dimension of the model.

ClusMem

A vector which gives the cluster membership of the samples determined using PAM.

TrueMem

The vector of true cluster membership provided to the function through Clust. If true cluster membership is not provided, it returns a value NULL.

OptimClust

A integer value giving the optimal number of clusters determined by OptimClusts.

SilPlot

A vector of length 2 √{N} - 1, where N is the number of samples. It contains the average silhouette width when the number of clusters is between 2 and 2 √{N}.

Author(s)

Deepak Nag Ayyala <deepaknagayyala@gmail.com>

References

Chen, J., et.al. (2012) Associating microbiome composition with environmental covariates using generalized UniFrac distance, Bioinformatics, 28(16).

See Also

GrammRServ

Examples

1
2
3
4
5
6
## Not run: data(metagencounts)
X <- list(Contents = metagencounts$Counts, Clust = metagencounts$CommMemshp, 
DataType = "Counts", DistType = "Kendall's tau-distance", 
Dimensions = c(2,3,4), Norms = c(1,2), Penalty = 0.5, MinClust = 2);
GraphMetagen(X);
## End(Not run)

GrammR documentation built on May 1, 2019, 8:46 p.m.