Disclaimer

This vignette describes concepts, strategies and systematics developed for running data consistency checks for pedigree data.

Introduction

From a theoretical point of view, pedigrees are Directed Acyclic Graphs (DAG), where a graph $G$ is defined by the set of vertices or nodes $V$ and the set of edges $E$. The set of vertices correspond to the set of individuals in the pedigree. Edges correspond to directed relationships from parents to their offspring. In a pedigree, the direction of the edges is always from parents to offspring. A pedigree must not contain any directed cycles, i.e. there cannot be any path corresponding to a series of directed edges linked together where a parent appears after one of its offspring. These two properties

  1. directed edges between parents and offspring
  2. no cycles

together have led to the name DAG.

In a directed graph, we can distinguish between the number of edges that are coming into a node and the number of edges going out of a node. The former number is called in-degree and the latter number corresponds to the out-degree. For a pedigree, the maximum in-degree for every node is $2$.

Based on the described properties, we can define a set of consistency requirements that must be fullfilled when a pedigree is constructed.

Consistency Requirements

This section describes consistency requirements that are derived from the properties of a pedigree described in the previous section.

Properties of a DAG

The following list of requirements is derived from the properties of a DAG

Other requirements

There are additional properties which are more related to data-processing issues. Those issues mostly involve the correctnes of certain data-formats.

Implementation

Implementations do depend among many things on the type of data representation of a given pedigree.

Data representation

One of the most commonly used data representation of a pedigree is the so-called node-list or adjacency-list. This is a tabular list with columns containing IDs for animals, IDs for parents and additional information such as sex and birthdate. One row of the list corresponds to the available information for a given individual and hence must be unique. Such a row is also called a pedigree record.

Types of implementation routines

When it comes to verifying the consistency requirements two type of implementation routines can be imagined

The descriptions of each check is described in a companion vignette on Pedigree Checks - Implementations which is also available in this package.

The descriptions of each transformation is described in a companion vignette on Pedigree Transformation - Implementations which is also available in this package.

Vignettes Overview

Session Info

sessionInfo()

Latest Update

r paste(Sys.time(),paste0("(", Sys.info()[["user"]],")" ))



pvrqualitasag/PedigreeFromTvdData documentation built on May 29, 2019, 7:50 a.m.