knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6.5, fig.height = 6.5, dpi = 96, out.width = "100%" ) library(visPedigree)
Pedigrees are fundamental to both animal and plant breeding. They are used to improve the accuracy of breeding value estimation, to monitor and manage inbreeding, and to support a wide range of downstream analyses based on ancestry and relatedness. In applied settings, however, pedigree data are often incomplete or poorly ordered: founders may be omitted, parents may appear after offspring, and sex annotations may be missing or inconsistent. In addition, pedigrees are commonly stored in a simple three-column form (individual, sire, dam), which is convenient for storage but less convenient for checking structure, tracing relatives, or producing readable graphical displays.
Several software tools address parts of this workflow. On Windows, pedigraph and pedigree viewer provide facilities for pedigree trimming and display. Within R, packages such as pedigree, nadiv, and optiSel support pedigree preparation and analysis, while kinship2 can be used to draw pedigree trees. Nevertheless, large pedigrees continue to pose practical difficulties for data cleaning, loop detection, multi-generation tracing, and graphical display with limited node overlap.
The visPedigree package was developed to provide an integrated workflow for pedigree tidying, analysis, and visualization. Built on data.table, C++, and igraph, it supports pedigree standardization, loop detection, candidate tracing, integer pedigree construction, generation assignment, and optional inbreeding calculation, together with scalable pedigree visualization. The package is designed for both animal and plant pedigrees, including selfing or monoecious mating systems. This guide focuses on the tidying workflow and introduces the main arguments and outputs of tidyped().
The main contents of this guide are as follows:
The visPedigree package can be installed from CRAN:
install.packages("visPedigree")
Or from GitHub:
# install.packages("devtools") devtools::install_github("luansheng/visPedigree")
The first three columns of pedigree data must be in the order of individual, sire, and dam IDs. The column names can be customized, but their order must remain unchanged. Individual IDs should not be coded as "", " ", "0", *, or NA; otherwise, they will be removed from the pedigree. Missing parents should be denoted by NA, 0, or *. Spaces and empty strings ("") will also be treated as missing parents, though this is not recommended. Additional columns, such as sex and generation, can also be included.
The pedigree can be checked and tidied through the tidyped() function.
This function takes a pedigree, checks for duplicates and bisexual individuals, detects loops, adds missing founders, sorts the pedigree, and traces candidate pedigrees.
If the cand parameter is provided, only those individuals and their ancestors or descendants are retained.
Tracing direction and the number of generations can be specified using the trace and tracegen parameters.
Virtual generations are inferred and assigned when addgen = TRUE.
A numeric pedigree is generated when addnum = TRUE.
Sex will be inferred for all individuals if sex information is missing. If a Sex column is present, values should be coded as 'male', 'female', or NA (unknown). Missing sex information will be inferred from the pedigree structure where possible.
The visPedigree package comes with multiple datasets. You can check through the following command.
data(package="visPedigree")
The following code displays the simple_ped dataset, which contains four columns: individual, sire, dam, and sex. Missing parents are denoted by 'NA', '0', or *. Founders are not explicitly listed, and some parents appear after their offspring in the original data.
head(simple_ped) tail(simple_ped) # The number of individuals in the pedigree dataset nrow(simple_ped) # Individual records with missing parents simple_ped[Sire %in% c("0", "*", "NA", NA) | Dam %in% c("0", "*", "NA", NA)]
Example: If we incorrectly set the female J0Z167 as the sire of J2F588, tidyped() will detect this bisexual conflict.
x <- data.table::copy(simple_ped) x[ID == "J2F588", Sire := "J0Z167"] y <- tidyped(x)
The tidyped() function sorts the pedigree, replaces missing parents with NA, ensures parents precede their offspring, and adds missing founders.
tidy_simple_ped <- tidyped(simple_ped) head(tidy_simple_ped) tail(tidy_simple_ped) nrow(tidy_simple_ped)
In the resulting tidy_simple_ped, founders are added with their inferred sex, and parents are sorted before their offspring. The number of individuals increases from 31 to 59. The columns are renamed to Ind, Sire, and Dam. Missing parents are uniformly replaced with NA, and tidyped() provides informative messages during processing. By default, tidy_simple_ped includes new columns: Gen, IndNum, SireNum, and DamNum. These can be disabled by setting addgen = FALSE and addnum = FALSE.
If the input dataset lacks a Sex column, it will be automatically added to the tidied output.
tidy_simple_ped_no_gen_num <- tidyped(simple_ped, addgen = FALSE, addnum = FALSE) head(tidy_simple_ped_no_gen_num)
Once tidied, you can use data.table::fwrite() to export the pedigree for genetic evaluation software like ASReml.
A pedigree loop occurs when an individual is its own ancestor (e.g., A is the parent of B, B is the parent of C, and C is the parent of A). This is a biological impossibility and a serious error in pedigree records. The tidyped() function automatically detects these cycles using graph theory algorithms. If a loop is detected, the function will stop and provide information about the individuals involved in the loop.
The following code demonstrates what happens when a pedigree with loops is processed:
# loop_ped contains cycles (e.g., V -> T -> R -> P -> M -> V) # Attempting to tidy it will result in an error try(tidyped(loop_ped))
Detecting loops early is crucial for ensuring the integrity of genetic evaluations.
When saving the pedigree, missing parents should typically be replaced with 0.
saved_ped <- data.table::copy(tidy_simple_ped) saved_ped[is.na(Sire), Sire := "0"] saved_ped[is.na(Dam), Dam := "0"] data.table::fwrite( x = saved_ped, file = tempfile(fileext = ".csv"), sep = ",", quote = FALSE )
To trace the pedigree of specific individuals, use the cand parameter. This adds a Cand column where TRUE identifies the specified candidates. If cand is provided, only the candidates and their ancestors/descendants are retained.
tidy_simple_ped_J5X804_ancestors <- tidyped(ped = tidy_simple_ped_no_gen_num, cand = "J5X804") tail(tidy_simple_ped_J5X804_ancestors)
By default, the function traces ancestors. You can limit the number of generations using tracegen. If tracegen is NULL, all available generations are traced.
tidy_simple_ped_J5X804_ancestors_2 <- tidyped(ped = tidy_simple_ped_no_gen_num, cand = "J5X804", tracegen = 2) print(tidy_simple_ped_J5X804_ancestors_2)
The code above traces the ancestors of J5X804 back two generations.
To trace descendants, set trace = 'down'.
There are three options for the trace parameter:
tidy_simple_ped_J0Z990_offspring <- tidyped(ped = tidy_simple_ped_no_gen_num, cand = "J0Z990", trace = "down") print(tidy_simple_ped_J0Z990_offspring)
Tracing the descendants of J0Z990 reveals a total of 5 individuals.
Certain genetic evaluation programs require integer-coded pedigrees, where individuals are numbered consecutively to facilitate the calculation of the additive genetic relationship matrix.
By default, tidyped() adds IndNum, SireNum, and DamNum columns. This can be disabled with addnum = FALSE.
tidy_simple_ped_with_int <- tidyped(ped = tidy_simple_ped_no_gen_num, addnum = TRUE) head(tidy_simple_ped_with_int)
The inbreeding coefficient (F) of each individual can be calculated using tidyped() or inbreed() functions. There are two options to add the inbreeding coefficients to a tidied pedigree:
inbreed = TRUE in the tidyped() function. This will calculate the inbreeding coefficients using an optimized C++ implementation of the Sargolzaei & Iwaisaki (2005) LAP bucket algorithm and add an f column to the tidied pedigree.inbreed() directly on a tidied pedigree to add the f column. Both options use the same high-performance engine as pedmat(method = "f"), ensuring consistent results across the package.
# Create a simple inbred pedigree library(data.table) test_ped <- data.table( Ind = c("A", "B", "C", "D", "E"), Sire = c(NA, NA, "A", "C", "C"), Dam = c(NA, NA, "B", "B", "D"), Sex = c("male", "female", "male", "female", "male") ) # Option 1: Calculate during tidying tidy_test <- tidyped(test_ped, inbreed = TRUE) head(tidy_test) # Option 2: Calculate after tidying tidy_test <- inbreed(tidyped(test_ped))
Generation inference is essential for pedigree visualization. tidyped() provides two methods for assigning generation numbers via the genmethod parameter:
# Default behavior (Top-Down): J2Y434 is at Gen 3 tidy_top <- tidyped(simple_ped, genmethod = "top") tidy_top[Ind == "J2Y434"] # Bottom-Up behavior: J2Y434 is at Gen 6 tidy_bottom <- tidyped(simple_ped, genmethod = "bottom") tidy_bottom[Ind == "J2Y434"]
The summary() method provides a quick overview of the pedigree statistics, including the number of individuals, sex distribution, founders, and isolated individuals. If inbreeding coefficients have been calculated (column f), the summary will also include descriptive statistics of inbreeding.
# Summarize the tidied pedigree summary(tidy_simple_ped)
For extremely large pedigrees, it is sometimes useful to split them into disconnected subsets or "sub-pedigrees". The splitped() function automatically detects disconnected components (families that share no ancestors) and splits the pedigree into a list of smaller tidyped objects.
# Split the pedigree into components sub_pedigrees <- splitped(tidy_simple_ped) # View summary of the split result summary(sub_pedigrees) # Access a specific sub-pedigree # first_sub <- sub_pedigrees[[1]]
See Also:
- vignette("draw-pedigree", package = "visPedigree")
- vignette("relationship-matrix", package = "visPedigree")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.