The goal of pedbuildr is to reconstruct small/medium-sized pedigrees from genotype data. The most important functions of the package are
buildPeds(): generates all pedigrees containing a given set of members
reconstruct(): finds the most likely pedigree given the available genotype data
The development version of pedbuildr is available from GitHub:
Load the package into R as follows:
library(pedbuildr) #> Loading required package: pedtools
Suppose we want to find all pedigrees linking 3 individuals: two males and one female, labeled
3 respectively. Without any restrictions
buildPeds() identifies 142 such pedigrees:
plist = buildPeds(ids = 1:3, sex = c(1, 1, 2)) length(plist) #>  142
Here are some of them:
plotPeds(plist[c(1, 20, 100, 141)])
plotPeds() is a thin wrapper around
pedtools::plot.ped(), and highlights the original individuals in each pedigree. It does not handle disconnected pedigrees, so to inspect these we may use
plotPedList(plist[], frames = F)
With more than a handful individuals, the number of pedigrees quickly becomes uncontrollably large. Hence it becomes vital to impose restrictions on the pedigree space.
connected = TRUE to the call,
buildPeds() will return only connected pedigrees. In our example, this takes the total number down to 120:
plist2 = buildPeds(ids = 1:3, sex = c(1, 1, 2), connected = TRUE) length(plist2) #>  120
As a further restriction, one may disallow certain types of inbreeding. Many of the pedigrees found above contain matings between parent-offspring, or even grandparent-grandchild. In many practical cases these may be irrelevant. To skip pedigrees with such features, we add
maxLinearInbreeding = 0.
plist3 = buildPeds(ids = 1:3, sex = c(1, 1, 2), connected = TRUE, maxLinearInbreeding = 0) length(plist3) #>  65
If we set
maxLinearInbreeding = 1 instead, then parent-child mating is allowed, but not grandparent-grandchild or higher separations.
Known parent-child pairs are conveyed to
buildPeds() using the
knownPO parameter. For instance, in our running example suppose we know that 2 and 3 form a parent-child pair (in some order).
plist4 = buildPeds(ids = 1:3, sex = c(1, 1, 2), connected = TRUE, maxLinearInbreeding = 0, knownPO = list(2:3)) length(plist4) #>  22
Here is a selection of these pedigrees:
plotPeds(plist4[c(2, 3, 11, 15, 22)])
We may also know that certain pairs are not parent-child; this can be imposed by using the
notPO parameter. Another option is to set
allKnown = TRUE, meaning that
knownPO should be taken as the complete list of parent-child pairs among the input individuals. We add this to our running example:
plist5 = buildPeds(ids = 1:3, sex = c(1, 1, 2), connected = TRUE, maxLinearInbreeding = 0, knownPO = list(2:3), allKnown = TRUE) length(plist5) #>  10
Here are the 10 remaining pedigrees:
In the previous plot there are two pairs of pedigrees (3 and 5, and 7 and 10) which are identical except for the distribution of genders among the added parents. To avoid such redundancy, we add
genderSym = TRUE to the call:
plist6 = buildPeds(ids = 1:3, sex = c(1, 1, 2), connected = TRUE, maxLinearInbreeding = 0, knownPO = list(2:3), allKnown = TRUE, genderSym = TRUE) length(plist6) #>  8
Here are the final solutions:
The aim of this section is to show how to perform pedigree reconstruction. We start with some "true" pedigree and simulate some marker data for it. We then forget the pedigree structure, and then see if we are able to reconstruct it from the marker data.
Suppose the true relationship between individuals
3 is as follows:
To generate some data, we create the true pedigree and simulate 10 markers (each with 4 alleles). The simulation is done with
markerSim() from the
x = nuclearPed(fa = "fa", mother = "mo", children = 1:2) x = addDaughter(x, parent = 2, id = 3) #> Mother: Creating new individual with ID = NN_1 # plot(x) # Simulate x = forrel::markerSim(x, N = 10, ids = 1:3, alleles = 1:4, verbose = F, seed = 123) x #> id fid mid sex <1> <2> <3> <4> <5> #> fa * * 1 -/- -/- -/- -/- -/- #> mo * * 2 -/- -/- -/- -/- -/- #> 1 fa mo 1 1/1 4/1 4/2 3/1 4/4 #> 2 fa mo 1 3/1 3/1 4/4 4/4 4/4 #> NN_1 * * 2 -/- -/- -/- -/- -/- #> 3 2 NN_1 2 1/2 3/4 4/2 4/1 4/2 #> Only 5 (out of 10) markers are shown. See `?print.ped` for options.
We extract the allele matrix and the locus attributes of the 10 markers.
genodata = getAlleles(x, ids = 1:3) loci = lapply(x$markerdata, attributes)
Now let us try to reconstruct the pedigree from the data. The main input to the
reconstruct() function is the
alleleMatrix and the
loci parameters. By default the function will then call our friend
buildPeds() from the previous section to generate a list of candidate pedigrees. (The number of target individuals is determined by the number of rows in
alleleMatrix.) In this example we won't impose any restrictions on the pedigree space, so we simply indicate the genders.
result = reconstruct(alleleMatrix = genodata, loci = loci, sex = c(1, 1, 2)) #> Building pedigree list #> Undirected adjacency matrices: 8 #> Directed adjacency matrices: 22 #> After adding parents: 142 #> Pedigrees: 142 #> #> Computing likelihoods of 142 pedigrees...done!
plotBestPeds() show the pedigrees with the highest likelihood:
plotBestPeds(result, top = 6)
Lo and behold - the correct pedigree was the most likely one!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.