This document describes tests of a few tools used for re-numbering of pedigrees.
From a theoretical point of view a pedigree is a special instance of a graph called a Directed Acyclic Graph (DAG). In the context of this document the term graph is not a synonym for a special type of diagram, but it stands for a class of objects that are used in discrete mathematics. A graph $G$ is defined by two sets $V$ and $E$ where the set $V$ contains the so called nodes and the set $E$ contains edges. Each edge relates two nodes. A more formal definition of a graph is given by
$$G = (V, E)$$
Graphs can be divided into two classes.
The difference between undirected and directed graphs is that the order of the two nodes that define an edge matters only in directed graphs. When looking at an edge of a directed graph, the first node is called the parent node and the second node is called the offspring node.
As already mentioned, a pedigree can be understood as an instance of a special class of directed graphs. For such a graph the set of nodes $V$ contains all animals in the pedigree and the set $E$ of all edges contains the relationships between parents and offspring. Based on the representation of a pedigree as a graph, it is clear that pedigrees fall into the class of directed graphs, because the relation between parents and offspring has a clear direction. Furthermore, any directed graph that represents a pedigree must not have any cyles, i.e., the directed graph must be acyclic. The condition of being acyclic means that starting at any node $v$ in the graph, there is no sequence of nodes defined by the directed edges given in the graph, where the node $v$ is visited more than once.
Many software programs that use pedigree information have the requirement that individuals in a pedigree are ordered such that parents are always before offspring. In graph theory this type of order of the nodes is called topological ordering.
There are several tools to produce a topological ordering from a given pedigree. Two such tools are described here.
tsort
is a Unix tool that can be used from the command line. tsort
is described on https://en.wikipedia.org/wiki/Tsort. The example given on Wikipedia is shown below
knitr::include_graphics(path = "../png/305px-Directed_acyclic_graph_2.svg.png")
The usage for the above given DAG is
$ tsort <<EOF > 3 8 > 3 10 > 5 11 > 7 8 > 7 11 > 8 9 > 11 2 > 11 9 > 11 10 > EOF 7 5 3 11 8 10 2 9
Based on the above example, we can see that tsort
requires the DAG as a parent-offspring list. Such a list consists of two columns and the parent is in the first column and the offspring in the second column. The output is the nodes in topological ordering.
In case we have to re-code the nodes in the pedigree, we can assign the line-number of the ouput to each individual.
An alternative to tsort
is the R-package gRbase
. This package offers more functionality than just topological orderings.
### # gRbase requirement if (!require(gRbase)){ ### # RBGL was moved from CRAN to bioC source("https://bioconductor.org/biocLite.R") biocLite("RBGL") install.packages("gRbase", repos = "https://cran.rstudio.com/", dependencies = TRUE) require(gRbase) } ### # for plotting, we need if (!require(Rgraphviz)){ source("https://bioconductor.org/biocLite.R") biocLite("Rgraphviz") require(Rgraphviz) }
Start by defining a simple DAG taken from the website of gRbase
.
daG11 <- dag( ~a:b + b:c:d) #plot( daG11 ) ### # alternative specification daG13 <- dag( list( c("a", "b"), c("b", "c", "d") ) ) plot(daG13)
The above definitions mean that node a
has parent b
and node b
has parents c
and d
.
We now return to our example that was used with tsort
and we use gRbase
to come up with the topological ordering.
dagTsort <- dag( list( c("11", "5", "7"), c("8", "7", "3"), c("2", "11"), c("9", "11", "8"), c("10", "11", "3"))) plot(dagTsort)
The topological sort of the above defined DAG can be computed as
topoSort(dagTsort)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.