RZooRoH-package: RZooRoH: Partitioning of Individual Autozygosity into...

RZooRoH-packageR Documentation

RZooRoH: Partitioning of Individual Autozygosity into Multiple Homozygous-by-Descent Classes

Description

Functions to identify Homozygous-by-Descent (HBD) segments associated with runs of homozygosity (ROH) and to estimate individual autozygosity (or inbreeding coefficient). HBD segments and autozygosity are assigned to multiple HBD classes with a model-based approach relying on a mixture of exponential distributions. The rate of the exponential distribution is distinct for each HBD class and defines the expected length of the HBD segments. These HBD classes are therefore related to the age of the segments (longer segments and smaller rates for recent autozygosity / recent common ancestor). The functions allow to estimate the parameters of the model (rates of the exponential distributions, mixing proportions), to estimate global and local autozygosity probabilities and to identify HBD segments with the Viterbi decoding. The method is fully described in Druet and Gautier (2017) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/mec.14324")} and Druet and Gautier (2022) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.tpb.2022.03.001")}.

Functions to identify Homozygous-by-Descent (HBD) segments associated with runs of homozygosity (RoH) and to estimate individual autozygosity (or inbreeding coefficient). HBD segments and autozygosity are assigned to multiple HBD classes with a model-based approach relying on a mixture of exponential distributions. The rate of the exponential distribution is distinct for each HBD class and defines the expected length of the HBD segments. These HBD classes are therefore related to the age of the segments (longer segments and smaller rates for recent autozygosity / recent common ancestor). The functions allow to estimate the parameters of the model (rates of the exponential distributions, mixing proportions), to estimate global and local autozygosity probabilities and to identify HBD segments with the Viterbi decoding. Functions also allow to compute kinship between individuals and to predict inbreeding in the future progeny of a genotyped couple.

Data pre-processing

Note that the model is designed for autosomes. Other chromosomes and additional filtering (e.g. bi-allelic markers, markers and individuals filtering on call rate, coding of missing genotypes, HWE, etc.) should be performed prior to run RZooRoH with tools such as plink or bcftools. The model works on an ordered map and ignores SNPs with a null position.

RZooRoH functions

The main functions included in the package are zoodata(), zoomodel() and zoorun(). The zoorun() function can also be applied to two (phased) haplotypes to obtain IBD probabilities with the same model. This is possible for haploid data (haploid organism or eventually specific cases with sex chromosomes) or for diploid individuals, but this requires then a prior phasing step. The zookin() functions estimates kinship between pairs of individuals with the ZooRoH model. To that end, it computes the IBD relationship between the four possible pairs of haplotypes between the two individuals (this is only possible with phased data). There are also four functions to plot the results: zooplot_partitioning(), zooplot_hbdseg(), zooplot_prophbd() and zooplot_individuals(). Eight accessors functions help to extract the results: realized(), cumhbd(), rohbd(), probhbd(), merge_zres and update_zres after HBD estimation and cumkin() and predhbd() after using zookin().

You can obtain individual help for each of the functions. By typing for instance: help(zoomodel) or ? zoomodel.

To run RZooRoH, you must first load your data with the zoodata() function. It will create a zooin object required for further analysis. Next, you need to define the model you want to run. You can define a default model by typing for instance, my.mod <- zoomodel(). Finally, you can run the model with the zoorun function. You can choose to estimate parameters with different procedures, estimate global and local homozygous-by-descent (HBD) probabilities with the Forward-Backward procedure or identify HBD segments with the Viterbi algorithm. The results are saved in a zres object.

The four plot functions zooplot_partitioning(), zooplot_hbdseg(), zooplot_prophbd() and zooplot_individuals() use zres objects to make different graphics. Similarly, the accessor functions help to extract information from the zres objects (see vignette for more details).

To get the list of data sets (for examples):

data(package="RZooRoH")

And to get the description of one data set, type ? name_data (with name_data being the name of the data set). For instance:

? genosim

Author(s)

Maintainer: Tom Druet tom.druet@uliege.be

Authors:

  • Naveen Kumar Kadri

  • Mathieu Gautier

Other contributors:

  • Amandine Bertrand [contributor]

Examples


# Start with a small data set with six individuals and external frequencies.
freqfile <- (system.file("exdata","typsfrq.txt",package="RZooRoH"))
typfile <- (system.file("exdata","typs.txt",package="RZooRoH"))
frq <- read.table(freqfile,header=FALSE)
typdata <- zoodata(typfile,supcol=4,chrcol=1,poscol=2,allelefreq=frq$V1)
# Define a model with two HBD classes with rates equal to 10 and 100.
Mod2L <- zoomodel(K=2,base_rate=10)
# Run the model on all individuals.
typ.res <- zoorun(Mod2L,typdata)
# Observe some results: likelihood, realized autozygosity in different
# HBD classes and identified HBD segments.
typ.res@modlik
typ.res@realized
typ.res@hbdseg
# Define a model with one HBD and one non-HBD class and run it.
Mod1R <- zoomodel(K=1,predefined=FALSE)
typ2.res <- zoorun(Mod1R,typdata)
# Print the estimated rates and mixing coefficients.
typ2.res@krates
typ2.res@mixc

# Get the name and location of a second example file.
myfile <- (system.file("exdata","genoex.txt",package="RZooRoH"))
# Load your data with default format:
example2 <- zoodata(myfile)
# Define the default model:
my.model <- zoomodel()
# Run RZooRoH on your data with the model (parameter estimation with optim). This can
# take a few minutes because it is a large model for 20 individuals:
my.res <- zoorun(my.model,example2)
# To run the model on a subset of individuals with 1 thread:
my.res3 <- zoorun(my.model, example2, ids=c(7,12,16,18), nT = 1)
# Define a smaller model and run it on two individuals.
my.mod2 <- zoomodel(K=3,base_rate=10)
my.res4 <- zoorun(my.mod2, example2, ids=c(9,18))


RZooRoH documentation built on June 8, 2025, 9:32 p.m.