dot-prCalus: Calus-style recursive clustering of individuals using an OH...

.prCalusR Documentation

Calus-style recursive clustering of individuals using an OH matrix

Description

Performs a recursive hierarchical clustering on an opposing-homozygotes (OH) matrix to split individuals into two groups at each step (Ward clustering), until within-group OH values fall below a threshold derived from allele frequencies estimated from the genotype matrix.

Usage

.prCalus(oh, genotype)

Arguments

oh

A numeric matrix representing the opposing-homozygotes (OH) counts between individuals. Row and column names should be individual IDs. The matrix is expected to be square and symmetric.

genotype

A numeric genotype matrix of dimension n \times m (individuals \times SNPs), coded as 0 (AA), 1 (AB), 2 (BB), and 9 for missing values (as used in hsphase).

Details

The function returns a two-column data frame containing individual IDs and an assigned group code. Group codes are generated randomly (via rnorm()) and therefore are not stable across runs.

The threshold maxsnpnooh is computed from per-SNP minor allele frequencies (.maf) and then reduced by 10%. The recursion proceeds as:

  1. Compute pairwise distances from oh using .fastdist and convert to a dist object.

  2. Apply hierarchical clustering (hclust with method = "ward.D").

  3. Cut the dendrogram into k = 2 groups.

  4. For each group, compute the maximum within-group OH value; if it exceeds maxsnpnooh and group size is > 2, recurse into that subgroup. Otherwise, write group assignments to a temporary file and stop recursion.

Value

A data.frame with columns:

id

Individual ID (character).

group

An integer-like group code (generated randomly; not reproducible).

Side effects

This function writes to and reads from a file named "temp.txt" in the current working directory, and then deletes it.

See Also

hclust, cutree, as.dist


hsphase documentation built on Feb. 17, 2026, 5:07 p.m.