zookin: Use the ZooRoH model to estimate kinship between pairs of...

View source: R/zookin.R

zookinR Documentation

Use the ZooRoH model to estimate kinship between pairs of individuals

Description

Apply the defined model to estimate kinship for pairs of individuals by running the ZooRoH model to estimate IBD probabilities between the four possible pairs of chromosome of the two individuals (one chromosome from each individual). As for zoorun this includes parameter estimation, computation of realized IBD, IBD probabilities, and identification of IBD segments. Options are similar to the zoorun function.

Usage

zookin(
  zoomodel,
  zooin,
  parameters = TRUE,
  fb = TRUE,
  vit = TRUE,
  localhbd = FALSE,
  nT = 1,
  optim_method = "L-BFGS-B",
  maxiter = 1000,
  minmix = 1,
  maxr = 1e+08,
  kinpairs = NULL,
  RecTable = FALSE,
  trim_ad = FALSE,
  hemiprob = 0
)

Arguments

zoomodel

A valid zmodel object as defined by the zoomodel function. The model indicates whether rates of exponential distributions are estimated or predefined, the number of classes, the starting values for mixing coefficients and rates, the error probabilities. See "zoomodel" for more details. zookin can not be run with a KL model because four different rate parameters would be estimated for each pair of haplotypes, making interpretation difficult.

zooin

A valid zdata object as obtained by the zoodata function. See "zoodata" for more details.

parameters

Specifies whether the parameters are estimated by optimization with the L-BFGS-B method from the optim function (optional argument - true by default). If the user doesn't want to estimate the parameters he must set parameters=FALSE. In that case, the forward-backaward and Viterbi algorithms are run with the provided parameters.

fb

A logical indicating whether the forward-backward algorithm is run (optional argument - true by default). The Forward-Backward algorithm estimates the local probabilities to belong to each IBD or non-IBD class. By default, the function returns only the IBD probabilities for each class, averaged genome-wide, and corresponding to the realized autozygosity associated with each class. To obtain HBD probabilities at every marker position, the option localhbd must be set to true (this generates larger outputs).

vit

A logical indicating whether the Viterbi algorithm is run (optional argument - false by default). The Viterbi algorithm performs the decoding (determining the underlying class at every marker position). Whereas the Forward-Backward algorithms provide IBD probabilities (and how confident a region can be declared IBD), the Viterbi algorithm assigns every marker position to one of the defined classes (IBD or non-IBD). When informativity is high (many SNPs per IBD segments), results from the Forward-Backward and the Viterbi algorithm are very similar. The Viterbi algorithm is best suited to identify IBD segments. To estimate realized kinship and determine IBD status of a position, we recommend to use the Forward-Backward algorithm that better reflects uncertainty.

localhbd

A logical indicating whether the IBD probabilities for each individual at each marker are returned when using the Forward-Backward algorithm (fb option). This is an optional argument that is false by default.

nT

Indicates the number of threads used when running RZooRoH in parallel (optional argument - one thread by default).

optim_method

Indicates which method the optim R function will use to estimate the parameters of the model ("L-BFGS-B" by default). The possible methods are "Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN" and "Brent". Type "? optim" to have more information. In our experience, the "L-BFGS-B" method works well but the method achieving the best likelihood is variable (according to the data sets, the model, the priors, the constraints). The same goes for the efficiency (speed). When the zoorun does not converge, you can test with another method. Note that the only method allowing to put constraints on parameters is "L-BFGS-B" (other methods are unconstrained).

maxiter

Indicates the maximum number of iterations when estimating the parameters with the R optim function (optional argument - 100 by default). Iterations are not defined identically across methods. For instance, in one iteration of the "L-BFGS-B" method, the likelihood of the model, estimated with the forward algorithm, is evaluated multiple times. So, a value of 100 iterations is good for the "L-BFGS-B" method but large values are required for some other algorithms.

minmix

This indicates the minimal value for the mixing coefficients. By default it is set to 0 with the classical mixkl and kl models (unconstrained). However, when using the step option or the "Interval" HBDclass, the values is set to 1e-16 to avoid numerical problems. Note that constraints are only allowed with the "L-BFGS-B" method from optim.

maxr

This indicates the maximum difference between rates of successive classes. It is an optional argument set to an arbitrarily large value (100000000). Adding such constraints might slow down the speed of convergence and we recommend to run first without this constraint (constraints are only allowed with the "L-BFGS-B" method from optim).

kinpairs

A matrix with two columns, indicating the pairs of individuals being analyzed. This information is required for a zookin analysis.

RecTable

This is an optimal parameter indicating whether a finite number of genetic distances are used (false by default). This function can be used only with the "Interval" HBDclass. The "Interval" option can be slow, in particular if large "intervals" of generations are defined. To speed up computations, some variables are precomputed for a finite set of genetic distances, select to cover a broad range of possible values. The real genetic distance between two genetic markers is then replaced by the closest value in the table (the difference between the true and used genetic distances being also lower than 10"%").

trim_ad

This is an option still under evaluation (for testing only)

hemiprob

This is an option still under evaluation (for testing only)

Value

The function return a kinres object with several slots accesses by the "@" symbol. It is very similar to the zoores object. The three main results are kinres@realized (the matrix with partitioning of the genome in different IBD classes for each pair of individual), These values are scale like coancestry coefficient and represent the probability that two alleles sampled in the two individuals are IBD (and in that length class). The sum of the realized values in all IBD classes correspond to the kinship. kinres@hbdseg (a data frame with identified IBD segments) and zoores@hbdp (a list of matrices with IBD probabilities per SNP and per class). This gives at one marker position, the probability that two haplotypes, one sampled in each individual, are IBD (and in that length class) at that position. It corresponds also to the predicted HBD values in their future progeny.

Some slots present in zoores objects are not reported because for each pair of individuals we would then have four values. If you are interested in these values for each pair of haplotypes, you can run zoorun with the ibd option. These slots are @mixc, @modlik, @modbic, @niter, @optimerr.

Here is a list with all the slots and their description:

  1. kinres@npairs the number of pairs of individuals in the analysis,

  2. kinres@ids a matrix containing the numbers of the analyzed pairs of individuals (their position in the data file),

  3. kinres@krates the rates for the exponential distributions associated with each IBD or non-IBD class for all individuals,

  4. kinres@realized a matrix with estimated realized IBD per layer (columns) for each pair of individuals (rows). These values are obtained with the Forward-Backward algorithm - fb option - and averaging over the four possible haplotype pairs (between the two individuals),

  5. kinres@ibdp a list of matrices with the local probabilities of IBD in different layers (computed for every class and every pair of individuals). The IBD probability is the probability that two alleles sampled at that locus in the two individuals (one in each) are IBD with respect to that layer (e.g., the IBD segment must correspond to the layer). Each matrix has one row per layer / class and one column per snp. To access the matrix for pair (of individuals) i, use the brackets "[[]]", for instance kinres@hbdp[[i]],

  6. zookin@ibdseg a data frame with the list of identified IBD segments with the Viterbi algorithm (the columns are the haplotype pair number, the chromosome number, the first and last SNP of the segment, the positions of the first and last SNP of the segment, the number of SNPs in the segment, the length of the segment, the HBD state of the segment),

  7. kinres@sampleids is a vector with the names of the pair of samples (when provided in the zooin object through the zoodata function). The ids from the two individuals are separated by a "_".

  8. kinres@haplotype_ids is a vector with the information on haplotype pairs in the analysis and used in the ibdseg table.


RZooRoH documentation built on June 8, 2025, 9:32 p.m.