In luansheng/visPedigree: Tidying and visualization for animal pedigree

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

suppressPackageStartupMessages(is_installed <- require(devtools))
if (!is_installed) {
  install.packages("devtools")
  suppressPackageStartupMessages(require(devtools))
}

suppressPackageStartupMessages(is_installed <- require(visPedigree))
if (!is_installed) {
  install_github("luansheng/visPedigree")  
  suppressPackageStartupMessages(require(visPedigree))
}

Drawing the pedigree graph
4.1 A simple pedigree graph
4.2 A reduced pedigree graph
4.3 An outlined pedigree graph
4.4 How to use this package in a selective breeding program
4.4.1 Analysis of founders for an individual
4.4.2 The contribution of different families in a selective breeding program

4 Drawing the pedigree

visped function takes a pedigree tidied by the tidyped function, outputs a hierarchical graph for all individuals in the pedigree. The graph can be shown on the defaulted graphic device and be saved in a pdf file. The graph in the pdf file is a vector drawing, is legible and isn't overlapped. It is especially useful when the number of individuals is big and the width of individual label is long in one generation. This function can draw the graph of a very large pedigree (> 10,000 individuals per generation) by compacting the full-sib individuals. It is very effective for drawing the pedigree of aquatic animal, which usually including many full-sib families per generation in the nucleus breeding population. The outline of a pedigree without individuals' label is still shown if the width of a pedigree graph is longer than the maximum width (200 inches) of the pdf file. It is useful to help breeders quickly browse the process of constructing nucleus breeding population to see if there is the introduction of blood.

Important hints：It is strongly recommended to set the cand parameters when tidying a pedigree. After the pedigree is pruned by setting the cand parameter to the specific individuals, the generation number the individuals belonged to is more accurately inferred, and the layout of the individuals in the drawing pedigree tree will be more reasonable.

A small pedigree is drawn in the following figure. Legible vector figure is saved in a pdf file.

tidy_small_ped <-
  tidyped(ped = small_ped,
          cand = c("Y", "Z1", "Z2"))
visped(tidy_small_ped, compact = TRUE, file = "smallped.pdf")

In the above graph, two shapes and three colors are used. Circle is for individual, square is for family. Dark sky blue means male, dark golden rod means female, dark olive green means unknown sex. For example, one circle with dark sky blue means a male individual; One square with dark golden rod means all female individuals in a full-sib family when compact = TRUE. The ancestors are drawn at the top and descendants are drawn at the bottom in the pedigree graph. The parents and offspring are connected by a dummy node. The colors of lines from the offspring to the dummy nodes are dark grey, and the colors of lines from the dummy nodes to the sire and dam are same with the colors of parents.

4.1 A simple pedigree graph

The graph of the simple_ped pedigree trimmed is drawn and is displayed on the default graphics device of R or Rstudio. The addgen and addnum parameters need to be set to TRUE when tidying the pedigree using the tidyped function.

tidy_simple_ped <- tidyped(simple_ped)
visped(tidy_simple_ped)

Usually, the figure displayed on the Plots panel of Rstudio has poor definition. The individual IDs will overlap with each other due to the restricted size of the pedigree graph if the number of individuals is large. This problem will be resolved by saving the pedigree graph as vectorgraph in a pdf file. visped function will not output pedigree graph on the default graphics device by setting showgraph = FALSE.

suppressMessages(visped(tidy_simple_ped, showgraph = FALSE, file="simpleped.pdf"))

After opening the simpleped.PDF file and you'll see a high definition pedigree graph.

4.2 A reduced pedigree graph

Warning messages will be shown when you try to draw the pedigree graph of the deep_ped dataset.

cand_J11_labels <- deep_ped[(substr(Ind, 1, 3) == "K11"), Ind]
visped(tidyped(deep_ped, cand = cand_J11_labels, tracegen = 3))

  Too many individuals (>=3362) in one generation!!! Two choices:
1. Removing full-sib individuals using the parameter compact = TRUE; or, 
2. Visualizing all nodes without labels using the parameter outline = TRUE.
Rerun visped() function!

The function indicates that too many individuals in one generation to draw a pedigree graph. It is recommended to use the compact or outline parameters to simplify the pedigree.

First, let's try the compact parameter and output it in the deepped1.pdf file. The figure on the default graphic device has serious overlapping problems due to the large number of individuals and the limited plot size.

cand_J11_labels <- deep_ped[(substr(Ind,1,3) == "K11"),Ind]
visped(
  tidyped(
    deep_ped,
    cand = cand_J11_labels,
    trace = "up",
    tracegen = 3
  ),
  compact = TRUE,
  showgraph = TRUE,
  file = "deepped1.pdf"
)

Let's open the deepped1.pdf file and view the high-definition pedigree vectorgraph. Most of shapes are square at bottom, and the internal numbers are the total number of male or female individuals for each family. The individual label is shorter than square or circle, and it is not matched. The individual label can be magnified by increasing the cex parameter. Cex is used to control the size of the individual label (ID) in the graph. The bigger the cex is, the longer the individual label is, and vice versa. The range of cex is generally 0 to 1, can be greater than 1, with 0.1 as a break for each adjustment. The visped function will output warning messages including the cex value which was used for drawing the pedigreed graph.

visped(
  tidyped(
    deep_ped,
    cand = cand_J11_labels,
    trace = "up",
    tracegen = 3
  ),
  compact = TRUE,
  cex = 0.83,
  showgraph = FALSE,
  file = "deepped2.pdf"
)

Let's open the deepped2.pdf file to view the high-definition pedigree vectorgraph. There is higher matching degree between individual labels and shapes compared to deepped1.pdf. If it doesn't feel right, you can continue to modify the cex.

4.3 An outlined pedigree graph

An outlined pedigree graph will be drawn by setting outline=TRUE. Individual labels will not be shown in the graph. It is very effective for the large pedigree including many individuals.

In this graph, you can directly observe that there are external individuals introduced in some generations. Please click here to view the pdf file.

suppressMessages(visped(
  tidyped(
    deep_ped,
    cand = cand_J11_labels, 
    tracegen = 3),
  compact = TRUE,
  outline = TRUE,
  showgraph = TRUE,
  file = "deepped3.pdf"
))

4.4 How to use this package in a selective breeding program

4.4.1 An analysis of founders for an individual

Selective breeding is actually a process of enrichment of the desirable minor genes dispersed among multiple founders through successive mating for multiple generations. The support theory behind it is the well-known minor polygene hypothesis.

We select the individual "J110550G" in the deep_ped dataset to visualize its pedigree. The pdf pedigree is here.

suppressWarnings(J110550G_ped <-
                   tidyped(deep_ped, cand = "K110550H"))
suppressMessages(visped(J110550G_ped, showgraph = TRUE, file = "K110550HGped.pdf"))

As you can see from the figure above, the number of founder individuals (without parents) of the J110550G individual is r nrow(J110550G_ped[is.na(Sire) & is.na(Dam)]).This means that this individual has accumulated a number of favorable genes from the founders, so that the breeding object trait will be improved with great genetic gain.

4.4.2 The contribution of different families in a selective breeding program

When using the optimum contribution theory to optimize mating design, the number of individuals contributed by each family is not same, and the family with a high integrated selection index contributes more individuals. By visualizing pedigree, we can directly see the contribution ratio of different families.

The below codes will show the composition of the parents of 106 families born in the nucleus breeding population in 2007. Only two generations including parents and grandparents are drawn in the graph by setting the tracegen=2.

cand_2007_G8_labels <-
  big_family_size_ped[(Year == 2007) & (substr(Ind, 1, 2) == "G8"), Ind]
suppressWarnings(
  cand_2007_G8_tidy_ped_ancestor_2 <-
    tidyped(
      big_family_size_ped,
      cand = cand_2007_G8_labels,
      trace = "up",
      tracegen = 2
    )
)
sire_label <-
  unique(cand_2007_G8_tidy_ped_ancestor_2[Ind %in% cand_2007_G8_labels,
                                          Sire])
dam_label <-
  unique(cand_2007_G8_tidy_ped_ancestor_2[Ind %in% cand_2007_G8_labels,
                                          Dam])
sire_dam_label <- unique(c(sire_label, dam_label))
sire_dam_label <- sire_dam_label[!is.na(sire_dam_label)]
sire_dam_ped <-
  cand_2007_G8_tidy_ped_ancestor_2[Ind %in% sire_dam_label]
sire_dam_ped <-
  sire_dam_ped[, FamilyID := paste(Sire, Dam, sep = "")]
family_size <- sire_dam_ped[, .N, by = c("FamilyID")]
fullsib_family_label <- unique(sire_dam_ped$FamilyID)
suppressMessages(
  visped(
    cand_2007_G8_tidy_ped_ancestor_2,
    compact = TRUE,
    outline = TRUE,
    showgraph = TRUE
  )
)

In the above figure, 106 families are shown at bottom, the parents are shown in middle, and the grandparents are shown at top. It can be seen that the parents are composed of r length(sire_label[!is.na(sire_label)]) sires and r length(dam_label[!is.na(dam_label)]) dams. The parents are from r length(fullsib_family_label) full-sib families in the generation of grandparent. About r family_size$N[1]+family_size$N[2] parents are from two full-sib families because the optimum contribution theory was used, and account for r round((family_size$N[1]+family_size$N[2])/sum(family_size$N),4)*100% of the total number of parents.

luansheng/visPedigree documentation built on Feb. 4, 2023, 11:10 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com