knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  fig.align = "center",
  dpi = 300
)

pedbuildr

CRAN status

The goal of pedbuildr is to reconstruct pedigrees from genotype data. This is done by optimising the likelihood over all possible pedigrees subject to given restrictions. As part of the pedsuite ecosystem of R packages for pedigree analysis, it uses pedtools for handling pedigrees, and imports pedprobr for calculating pedigree likelihoods.

See also the ibdEstimate() function of forrel, which does pairwise relatedness estimation.

Installation

The pedbuildr package can be installed from CRAN as follows:

install.packages("pedbuildr")

Alternatively, the latest development version is available from GitHub.

# install.packages("devtools")
devtools::install_github("magnusdv/pedbuildr")

A reconstruction example

To get started, load pedbuildr.

library(pedbuildr)

The built-in dataset trioData contains simulated genotypes for three individuals at 100 equifrequent SNP markers. Here are the first 10 columns:

trioData[, 1:10]

As a simple demonstration we will try to reconstruct the pedigree connecting these individuals assuming they are all males. To initialise the process we create them as singletons and attach the marker data. The locusAttributes argument tells R that all the markers are SNPs with alleles 1 and 2.

x = singletons(1:3) |> 
  setMarkers(alleleMatrix = trioData, locusAttributes = "snp12")

# Plot the individuals including genotypes for the first two markers
plot(x, marker = 1:2)

To reconstruct the pedigree, simply run reconstruct():

res = reconstruct(x)

A tailor-made plot function makes it easy to visualise the most likely pedigrees:

plot(res, top = 6)

The most likely pedigree is plotted top left. The titles for the remaining pedigrees give the likelihood ratio (LR) comparing the first pedigree to the one shown. For example, the first solution is r round(exp(res$loglik[1] - res$loglik[2]), 1) more likely than the second.

Further options

A priori there are infinitely many possible pedigrees connecting a set of individuals. (For example, two individuals may be k'th cousins for any k = 1,2,... .) In order to obtain a manageable search space, reconstruct() offers a range of restriction parameters:

Let us re-run the reconstruction of trioData adding a few of these restrictions. We allow 3 extra individuals and indicate that individual 1 is older than the others. Furthermore, we ask the program to infer parent-child relationships automatically. Finally we allow any amount of inbreeding.

res2 = reconstruct(x, extra = 3, age = "1 > 2,3", inferPO = TRUE, maxInbreeding = 1)

The most likely results this time are shown below:

plot(res2, top = 6)

We see that the same pedigree "wins", but some inbred/esoteric alternatives have appeared among the runners-up.

More about extra

Arguably the most important parameter to reconstruct() is extra, which controls the size of the pedigrees to consider. It can be either a nonnegative integer, or the word "parents". If an integer, it sets the maximum number of extra members used to connect the original individuals. (The final pedigrees may contain further extras still, since missing parents are added at the end.)

If extra = "parents", a special algorithm is invoked. First all directed acyclic graphs between the original individuals are generated, and then parents are added in all possible ways. This is (currently) the default behaviour, since it avoids setting an ad hoc number of "extras". However, it only works well in relatively small cases.



magnusdv/pedbuildr documentation built on Sept. 8, 2024, 12:49 p.m.