dismay: Calculate distance or similarity measures on a matrix

Description Usage Arguments Details Value References Examples

Description

dismay provides a single interface to calculate several measures of distance or similarity between all pairs of features in a matrix input, where rows correspond to samples and columns correspond to biological features (e.g., genes, proteins, or metabolites).

Usage

1
2
3
dismay(mat, metric = c("pearson", "spearman", "kendall", "bicor",
  "zi_kendall", "binomial", "MI", "cosine", "jaccard", "canberra",
  "euclidean", "manhattan", "RA", "weighted_rank", "hamming"), ...)

Arguments

mat

the matrix of interest, with samples in rows and biological features in columns

metric

the measure of distance or similarity to calculate

...

other arguments passed into the appropriate function

Details

Details about the implementation of each distance/similarity metric are as follows:

  1. Pearson correlation: uses the fast cor function from WGCNA, adapted to handle missing data

  2. Spearman correlation: uses the base R cor method

  3. Kendall correlation: uses the fast calculation of Kendall's tau implemented by pcaPP in the cor.fk function

  4. Biweight midcorrelation: uses the fast implementation in the bicor function from WGCNA

  5. Zero-inflated Kendall correlation: uses the estimator of Kendall's tau adapted to zero-inflated count data, described by Pimentel et al.

  6. Binomial: calculates the negative log10 of the binomial distribution P-values between genes based on presence/absence across cells, as proposed by Mohammadi et al., using an implementation specific to dismay

  7. Mutual information: uses the WGCNA implementation in the mutualInfoAdjacency function

  8. Cosine similarity: uses the cosine function in lsa

  9. Jaccard index: calculates the Jaccard index between genes based on presence/absence across cells, using a custom implementation

  10. Euclidean distance: uses the base R dist method

  11. Canberra distance: uses the base R dist method

  12. Manhattan distance: uses the base R dist method

  13. Weighted rank correlation: implements weighted rank correlation as described in Zar, "Biostatistical Analysis", 5th ed.

  14. Hamming distance: calculates the Hamming distance between genes based on presence/absence across cells, using a custom implementation

  15. Sorensen-Dice coefficient: uses the implementation within the dissimilarity function from the arules package

  16. phi_s: calculates the symmetric version of the measure of proportionality phi from the propr package, implemented in the proportionality function

  17. rho_p: calculates the symmetric version of the measure of proportionality rho from the propr package, implemented in the proportionality function

Distance metrics (Euclidean, Canberra, and Manhattan distances, and the phi_s measure of proportionality) are multiplied by -1 for consistency (i.e., higher values indicate greater similarity across all measures of association).

Value

The similarity matrix between all columns in the input matrix.

References

\insertRef

christensen2005dismay

\insertRef

langfelder2012dismay

\insertRef

pimentel2015dismay

\insertRef

mohammadi2018dismay

Examples

1
2
3
4
5
mat = matrix(rnorm(100), ncol = 10, dimnames = list(paste("sample", 1:10), 
  paste("gene", letters[1:10])))
mat[mat < 0] = 0
tc = dismay(mat, 'jaccard')
cos = dismay(mat, 'cosine')

skinnider/dismay documentation built on May 6, 2019, 12:21 p.m.