tidyped: Tidy and prepare a pedigree

View source: R/tidyped.R

tidypedR Documentation

Tidy and prepare a pedigree

Description

This function standardizes pedigree records, checks for duplicate IDs and incompatible parental roles, detects pedigree loops, injects missing founders, assigns generation numbers, sorts the pedigree, and optionally traces the pedigree of specified candidates. If the cand parameter contains individual IDs, only those individuals and their ancestors or descendants are retained. Tracing direction and the number of generations can be specified using the trace and tracegen parameters.

Usage

tidyped(
  ped,
  cand = NULL,
  trace = "up",
  tracegen = NULL,
  addgen = TRUE,
  addnum = TRUE,
  inbreed = FALSE,
  selfing = FALSE,
  genmethod = "top",
  ...
)

Arguments

ped

A data.table or data frame containing the pedigree. The first three columns must be individual, sire, and dam IDs. Additional columns, such as sex or generation, can be included. Column names can be customized, but their order must remain unchanged. Individual IDs should not be coded as "", " ", "0", "*", or "NA"; otherwise, they will be removed. Missing parents should be denoted by "NA", "0", or "*". Spaces and empty strings ("") are also treated as missing parents but are not recommended.

cand

A character vector of individual IDs, or NULL. If provided, only the candidates and their ancestors/descendants are retained.

trace

A character value specifying the tracing direction: "up", "down", or "all". "up" traces ancestors; "down" traces descendants; "all" traces the union of ancestors and descendants. This parameter is only used if cand is not NULL. Default is "up".

tracegen

An integer specifying the number of generations to trace. This parameter is only used if trace is not NULL. If NULL or 0, all available generations are traced.

addgen

A logical value indicating whether to generate generation numbers. Default is TRUE, which adds a Gen column to the output.

addnum

A logical value indicating whether to generate a numeric pedigree. Default is TRUE, which adds IndNum, SireNum, and DamNum columns to the output.

inbreed

A logical value indicating whether to calculate inbreeding coefficients. Default is FALSE. If TRUE, an f column is added to the output. This uses the same optimized engine as pedmat(..., method = "f").

selfing

A logical value indicating whether to allow the same individual to appear as both sire and dam. This is common in plant breeding (monoecious species) where the same plant can serve as both male and female parent. If TRUE, individuals appearing in both the Sire and Dam columns will be assigned Sex = "monoecious" instead of triggering an error. Default is FALSE.

genmethod

A character value specifying the generation assignment method: "top" or "bottom". "top" (top-aligned) assigns generations from parents to offspring, starting founders at Gen 1. "bottom" (bottom-aligned) assigns generations from offspring to parents, aligning terminal nodes at the bottom. Default is "top".

...

Additional arguments passed to inbreed.

Details

Compared to the legacy version, this function reports cyclic pedigrees more clearly and uses a mixed implementation. There are two candidate-tracing paths: when the input is a raw pedigree, igraph is used for loop checking, candidate tracing, and topological sorting; when the input is an already validated tidyped object and cand is supplied, tracing and topological sorting use integer-indexed C++ routines. Generation assignment can be performed using either a top-down approach (default, aligning founders at the top) or a bottom-up approach (aligning terminal nodes at the bottom).

Value

A tidyped object (which inherits from data.table). Individual, sire, and dam ID columns are renamed to Ind, Sire, and Dam. Missing parents are replaced with NA. The Sex column contains "male", "female", "monoecious", or NA. The Cand column is included if cand is not NULL. The Gen column is included if addgen is TRUE. The IndNum, SireNum, and DamNum columns are included if addnum is TRUE. The Family and FamilySize columns identify full-sibling families (for example, "AxB" for offspring of sire A and dam B). The f column is included if inbreed is TRUE.

See Also

summary.tidyped for summarizing tidyped objects visped for visualizing pedigree structure pedmat for computing relationship matrices vismat for visualizing relationship matrices splitped for splitting pedigree into connected components inbreed for calculating inbreeding coefficients

Examples

library(visPedigree)
library(data.table)

# Tidy a simple pedigree
tidy_ped <- tidyped(simple_ped)
head(tidy_ped)

# Trace ancestors of a specific individual within 2 generations
tidy_ped_tracegen <- tidyped(simple_ped, cand = "J5X804", trace = "up", tracegen = 2)
head(tidy_ped_tracegen)

# Trace both ancestors and descendants for multiple candidates
# This is highly optimized and works quickly even on 100k+ individuals
cand_list <- c("J5X804", "J3Y620")
tidy_ped_all <- tidyped(simple_ped, cand = cand_list, trace = "all")

# Check for loops (will error if loops exist)
try(tidyped(loop_ped))

# Example with a large pedigree: extract 2 generations of ancestors for 2007 candidates
cand_2007 <- big_family_size_ped[Year == 2007, Ind]

tidy_big <- tidyped(big_family_size_ped, cand = cand_2007, trace = "up", tracegen = 2)
summary(tidy_big)



visPedigree documentation built on March 30, 2026, 9:07 a.m.