getDissimilarity: Calculate dissimilarities

getDissimilarityR Documentation

Calculate dissimilarities

Description

These functions are designed to calculate dissimilarities on data stored within a TreeSummarizedExperiment object. For overlap, Unifrac, and Jensen-Shannon Divergence (JSD) dissimilarities, the functions use mia internal functions, while for other types of dissimilarities, they rely on vegdist by default.

Usage

addDissimilarity(x, method, ...)

## S4 method for signature 'SummarizedExperiment'
addDissimilarity(x, method = "bray", name = method, ...)

getDissimilarity(x, method, ...)

## S4 method for signature 'SummarizedExperiment'
getDissimilarity(
  x,
  method = "bray",
  assay.type = "counts",
  niter = NULL,
  transposed = FALSE,
  ...
)

## S4 method for signature 'TreeSummarizedExperiment'
getDissimilarity(
  x,
  method = "bray",
  assay.type = "counts",
  niter = NULL,
  transposed = FALSE,
  ...
)

## S4 method for signature 'ANY'
getDissimilarity(x, method = "bray", niter = NULL, ...)

Arguments

x

TreeSummarizedExperiment or matrix.

method

Character scalar. Specifies which dissimilarity to calculate. (Default: "bray")

...

other arguments passed into avgdist, vegdist, or into mia internal functions:

  • sample: The sampling depth in rarefaction. (Default: min(rowSums2(x)))

  • dis.fun: Character scalar. Specifies the dissimilarity function to be used.

  • tree.name: (Unifrac) Character scalar. Specifies the name of the tree from rowTree(x) that is used in calculation. Disabled when tree is specified. (Default: "phylo")

  • tree: (Unifrac) phylo. A phylogenetic tree used in calculation. (Default: NULL)

  • weighted: (Unifrac) Logical scalar. Should use weighted-Unifrac calculation? Weighted-Unifrac takes into account the relative abundance of species/taxa shared between samples, whereas unweighted-Unifrac only considers presence/absence. Default is FALSE, meaning the unweighted-Unifrac dissimilarity is calculated for all pairs of samples. (Default: FALSE)

  • node.label (Unifrac) character vector. Used only if x is a matrix. Specifies links between rows/columns and tips of tree. The length must equal the number of rows/columns of x. Furthermore, all the node labs must be present in tree.

  • chunkSize: (JSD) Integer scalar. Defines the size of data send to the individual worker. Only has an effect, if BPPARAM defines more than one worker. (Default: nrow(x))

  • BPPARAM: (JSD) BiocParallelParam. Specifies whether the calculation should be parallelized.

  • detection: (Overlap) Numeric scalar. Defines detection threshold for absence/presence of features. Feature that has abundance under threshold in either of samples, will be discarded when evaluating overlap between samples. (Default: 0)

name

Character scalar. The name to be used to store the result in metadata of the output. (Default: method)

assay.type

Character scalar. Specifies which assay to use for calculation. (Default: "counts")

niter

The number of iterations performed. If NULL, rarefaction is disabled. (Default: NULL)

transposed

Logical scalar. Specifies if x is transposed with cells in rows. (Default: FALSE)

Details

Overlap reflects similarity between sample-pairs. When overlap is calculated using relative abundances, the higher the value the higher the similarity is. When using relative abundances, overlap value 1 means that all the abundances of features are equal between two samples, and 0 means that samples have completely different relative abundances.

Unifrac is calculated with rbiom:unifrac().

If rarefaction is enabled, vegan:avgdist() is utilized.

For JSD implementation: Susan Holmes susan@stat.stanford.edu. Adapted for phyloseq by Paul J. McMurdie. Adapted for mia by Felix G.M. Ernst

Value

getDissimilarity returns a sample-by-sample dissimilarity matrix.

addDissimilarity returns x that includes dissimilarity matrix in its metadata.

References

For unifrac dissimilarity: http://bmf.colorado.edu/unifrac/

See also additional descriptions of Unifrac in the following articles:

Lozupone, Hamady and Knight, “Unifrac - An Online Tool for Comparing Microbial Community Diversity in a Phylogenetic Context.”, BMC Bioinformatics 2006, 7:371

Lozupone, Hamady, Kelley and Knight, “Quantitative and qualitative (beta) diversity measures lead to different insights into factors that structure microbial communities.” Appl Environ Microbiol. 2007

Lozupone C, Knight R. “Unifrac: a new phylogenetic method for comparing microbial communities.” Appl Environ Microbiol. 2005 71 (12):8228-35.

For JSD dissimilarity: Jensen-Shannon Divergence and Hilbert space embedding. Bent Fuglede and Flemming Topsoe University of Copenhagen, Department of Mathematics http://www.math.ku.dk/~topsoe/ISIT2004JSD.pdf

See Also

http://en.wikipedia.org/wiki/Jensen-Shannon_divergence

Examples

library(mia)
library(scater)

# load dataset
data(GlobalPatterns)
tse <- GlobalPatterns

### Overlap dissimilarity

tse <- addDissimilarity(tse, method = "overlap", detection = 0.25)
metadata(tse)[["overlap"]][1:6, 1:6]

### JSD dissimilarity

tse <- addDissimilarity(tse, method = "jsd")
metadata(tse)[["jsd"]][1:6, 1:6]

# Multi Dimensional Scaling applied to JSD dissimilarity matrix
tse <- runMDS(tse, FUN = getDissimilarity, method = "overlap", 
              assay.type = "counts")
metadata(tse)[["MDS"]][1:6, ]
              
### Unifrac dissimilarity

res <- getDissimilarity(tse, method = "unifrac", weighted = FALSE)
dim(as.matrix((res)))

tse <- addDissimilarity(tse, method = "unifrac", weighted = TRUE)
metadata(tse)[["unifrac"]][1:6, 1:6]


FelixErnst/mia documentation built on Nov. 18, 2024, 5:02 a.m.