dismay: Calculate distance or similarity measures on a matrix
In skinnider/dismay: dismay: distance metrics for matrices

Description Usage Arguments Details Value References Examples

dismay provides a single interface to calculate several measures of distance or similarity between all pairs of features in a matrix input, where rows correspond to samples and columns correspond to biological features (e.g., genes, proteins, or metabolites).

1
2
3

dismay(mat, metric = c("pearson", "spearman", "kendall", "bicor",
  "zi_kendall", "binomial", "MI", "cosine", "jaccard", "canberra",
  "euclidean", "manhattan", "RA", "weighted_rank", "hamming"), ...)

`mat`	the matrix of interest, with samples in rows and biological features in columns
`metric`	the measure of distance or similarity to calculate
`...`	other arguments passed into the appropriate function

Details about the implementation of each distance/similarity metric are as follows:

Pearson correlation: uses the fast cor function from WGCNA, adapted to handle missing data
Spearman correlation: uses the base R cor method
Kendall correlation: uses the fast calculation of Kendall's tau implemented by pcaPP in the cor.fk function
Biweight midcorrelation: uses the fast implementation in the bicor function from WGCNA
Zero-inflated Kendall correlation: uses the estimator of Kendall's tau adapted to zero-inflated count data, described by Pimentel et al.
Binomial: calculates the negative log10 of the binomial distribution P-values between genes based on presence/absence across cells, as proposed by Mohammadi et al., using an implementation specific to dismay
Mutual information: uses the WGCNA implementation in the mutualInfoAdjacency function
Cosine similarity: uses the cosine function in lsa
Jaccard index: calculates the Jaccard index between genes based on presence/absence across cells, using a custom implementation
Euclidean distance: uses the base R dist method
Canberra distance: uses the base R dist method
Manhattan distance: uses the base R dist method
Weighted rank correlation: implements weighted rank correlation as described in Zar, "Biostatistical Analysis", 5th ed.
Hamming distance: calculates the Hamming distance between genes based on presence/absence across cells, using a custom implementation
Sorensen-Dice coefficient: uses the implementation within the dissimilarity function from the arules package
phi_s: calculates the symmetric version of the measure of proportionality phi from the propr package, implemented in the proportionality function
rho_p: calculates the symmetric version of the measure of proportionality rho from the propr package, implemented in the proportionality function

Distance metrics (Euclidean, Canberra, and Manhattan distances, and the phi_s measure of proportionality) are multiplied by -1 for consistency (i.e., higher values indicate greater similarity across all measures of association).

The similarity matrix between all columns in the input matrix.

\insertRef

christensen2005dismay

\insertRef

langfelder2012dismay

\insertRef

pimentel2015dismay

\insertRef

mohammadi2018dismay

mat = matrix(rnorm(100), ncol = 10, dimnames = list(paste("sample", 1:10), 
  paste("gene", letters[1:10])))
mat[mat < 0] = 0
tc = dismay(mat, 'jaccard')
cos = dismay(mat, 'cosine')