| dist_binary | R Documentation |
Internal helper function to compute pairwise distances between binary vectors
using standard binary distance/similarity measures. Delegates to
ade4::dist.binary when available for performance.
dist_binary(x, method)
x |
A numeric matrix or data frame of binary values (0/1, TRUE/FALSE, or NA) |
method |
A character string specifying the binary distance measure to use. |
Supported methods (for two binary vectors x_i and x_j):
"jaccard":
d = 1 - \frac{a}{a + b + c}
"dice":
d = 1 - \frac{2a}{2a + b + c}
"sokal_michener":
d = 1 - \frac{a + d}{a + b + c + d}
"russell_rao":
d = 1 - \frac{a}{a + b + c + d}
"sokal_sneath":
d = 1 - \frac{a}{a + 1/2(b + c)}
"kulczynski":
d = 1 - \frac{1}{2}\left(\frac{a}{a+b} + \frac{a}{a+c}\right)
"hamming":
d = 1 - \frac{a + d}{a + b + c + d}
Where:
a = number of positions where both vectors are 1
b = number of positions where x_i = 1 and x_j = 0
c = number of positions where x_i = 0 and x_j = 1
d = number of positions where both vectors are 0
The Sokal-Michener coefficient is equivalent to the normalized Hamming distance.
Factors or character columns are converted to numeric 0/1.
Missing values (NA) are ignored pairwise; if all entries are missing, distance is NA.
Methods supported by ade4 (e.g., Jaccard, Dice, Sokal-Michener, etc.) are
computed via ade4::dist.binary for efficiency.
Manual computations are implemented for Hamming and Kulczynski if ade4 is unavailable.
A symmetric numeric matrix of pairwise distances. NA is returned for pairs with no valid comparisons (all NA entries).
# Small example with binary matrix
mat <- matrix(c(
1, 0, 1,
1, 1, 0,
0, 1, 1
), nrow = 3, byrow = TRUE)
# Example with Jaccard
dbrobust::dist_binary(mat, method = "jaccard")
# Example with Hamming
dbrobust::dist_binary(mat, method = "hamming")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.