dist_binary: Compute pairwise binary distances

View source: R/dist_binary.R

dist_binaryR Documentation

Compute pairwise binary distances

Description

Internal helper function to compute pairwise distances between binary vectors using standard binary distance/similarity measures. Delegates to ade4::dist.binary when available for performance.

Usage

dist_binary(x, method)

Arguments

x

A numeric matrix or data frame of binary values (0/1, TRUE/FALSE, or NA)

method

A character string specifying the binary distance measure to use.

Details

Supported methods (for two binary vectors x_i and x_j):

  • "jaccard":

    d = 1 - \frac{a}{a + b + c}

  • "dice":

    d = 1 - \frac{2a}{2a + b + c}

  • "sokal_michener":

    d = 1 - \frac{a + d}{a + b + c + d}

  • "russell_rao":

    d = 1 - \frac{a}{a + b + c + d}

  • "sokal_sneath":

    d = 1 - \frac{a}{a + 1/2(b + c)}

  • "kulczynski":

    d = 1 - \frac{1}{2}\left(\frac{a}{a+b} + \frac{a}{a+c}\right)

  • "hamming":

    d = 1 - \frac{a + d}{a + b + c + d}

Where:

  • a = number of positions where both vectors are 1

  • b = number of positions where x_i = 1 and x_j = 0

  • c = number of positions where x_i = 0 and x_j = 1

  • d = number of positions where both vectors are 0

The Sokal-Michener coefficient is equivalent to the normalized Hamming distance.

  • Factors or character columns are converted to numeric 0/1.

  • Missing values (NA) are ignored pairwise; if all entries are missing, distance is NA.

  • Methods supported by ade4 (e.g., Jaccard, Dice, Sokal-Michener, etc.) are computed via ade4::dist.binary for efficiency.

  • Manual computations are implemented for Hamming and Kulczynski if ade4 is unavailable.

Value

A symmetric numeric matrix of pairwise distances. NA is returned for pairs with no valid comparisons (all NA entries).

Examples

# Small example with binary matrix
mat <- matrix(c(
  1, 0, 1,
  1, 1, 0,
  0, 1, 1
), nrow = 3, byrow = TRUE)

# Example with Jaccard
dbrobust::dist_binary(mat, method = "jaccard")

# Example with Hamming
dbrobust::dist_binary(mat, method = "hamming")


dbrobust documentation built on Nov. 5, 2025, 6:24 p.m.