dot-rpohhsphase: Reconstruct half-sib groups by recursive clustering with a...

.rpohhsphaseR Documentation

Reconstruct half-sib groups by recursive clustering with a recombination stop rule

Description

Internal helper used by rpoh (reconstruction pedigree of half-sib families). This function recursively splits individuals into two clusters using hierarchical clustering on a distance derived from the provided opposing homozygote (OH) matrix, and then decides whether each cluster should be split further by checking the maximum number of recombination events inferred within that cluster.

Usage

.rpohhsphase(
  genotypeMatrix,
  oh,
  forwardVectorSize = 30,
  excludeFP = TRUE,
  nsap = 3,
  maxRec = 15
)

Arguments

genotypeMatrix

Numeric genotype matrix (individuals in rows, SNPs in columns) coded as '0', '1', '2' (and typically '9' for missing), as used by hsphase. This matrix is subset recursively when splitting clusters.

oh

A square opposing-homozygote matrix for the same individuals as genotypeMatrix (rownames/colnames are individual IDs). Typically produced by ohg. This matrix is subset recursively along with genotypeMatrix.

forwardVectorSize

Integer. Passed to bmh when computing recombination blocks inside each candidate cluster.

excludeFP

Logical. Passed to bmh.

nsap

Integer. Passed to bmh.

maxRec

Integer. Maximum allowed recombination count (within a cluster) before the cluster is recursively split again.

Details

The recursive splitting stops for a cluster when the maximum recombination count in that cluster is <= maxRec. Final group assignments are written to a temporary file and then read back as a two-column data frame.

The algorithm:

  1. Converts oh to a distance object via as.dist(.fastdist(oh)) and performs hierarchical clustering (hclust, Ward method).

  2. Splits into k = 2 clusters via cutree.

  3. For each cluster with at least 4 individuals, computes recombination counts as recombinations(bmh(subGenotype, ...)) and uses the maximum recombination count as a stop/split criterion.

  4. If max(recombinations) > maxRec, the cluster is split again recursively; otherwise, individuals in that cluster are assigned a new group label and written to a temporary file.

Value

A data.frame with two columns:

  • id: individual IDs

  • group: an integer-like group label assigned by the recursive procedure

Implementation notes

  • This function uses a fixed temporary filename "temp.txt" in the current working directory and deletes it at the end. This is not safe under parallel execution or if the working directory is not writable.

  • Group labels are generated using rnorm(), so results are not deterministic unless a seed is set and the recursion order remains identical.

See Also

rpoh, ohg, bmh, recombinations, .fastdist


hsphase documentation built on Feb. 17, 2026, 5:07 p.m.