imputeFounders: Impute underlying genotypes

Description Usage Arguments Value

View source: R/imputeFounders.R

Description

Impute the most likely sequence of underlying genotypes, using the Viterbi algorithm

Usage

1
2
3
4
5
6
7
8
imputeFounders(
  mpcrossMapped,
  homozygoteMissingProb = 1,
  heterozygoteMissingProb = 1,
  errorProb = 0,
  extraPositions = list(),
  showProgress = FALSE
)

Arguments

mpcrossMapped

An object containing genetic data and a genetic map

homozygoteMissingProb

The probability with which homozygous genotypes are observed as missing.

heterozygoteMissingProb

The probability with which heterozygous genotypes are observed as missing.

errorProb

The probability of a genotyping error.

extraPositions

Extra genetic positions at which to perform imputation.

showProgress

If this paramater is TRUE, a progress bar is produced.

Value

An object of class mpcrossMapped, containing all the information in the input object, and also including imputed IBD genotypes. This function uses the Viterbi algorithm to calculate the most likely sequence of underlying genotypes, given observed genetic data. The parameters for the algorithm are a homozygous mising rate, a heterozygous missing rate, and an error probability.

The two missing rates are intended to allow long strings of missing values to be imputed as heterozygotes, in the case that heterozygous genotypes are observed as missing much more often than homozygotes. Only the ratio of these two parameters is relevant, which is why the default values of 1 are acceptable. These default values really mean that the missing rates are equal.

The parameter extraPositions specifies the genetic positions at which imputation should be performed. This can be either a list, or a function such as generateGridPositions generateIntervalMidPoints. If a function is input, this function is applied to the input genetic map, to determine the extra genetic locations. If a list is input, the names of the list entries should be chromosome names, and the entry for each chromosome should be a named vector. We give an example of the list format in the examples section at the bottom of this page.

One subtlety when using extra genetic positions is that specifying such positions can change the results of the imputation process. This is undesirable, but does not represent a bug in the implementation. The Hidden Markov Model (HMM) used to model the genotypes is not exact, although it is a highly accurate approximation. As it is an approximation, it fails to satisfy the condition

P^{s+t} = P^t P^s

This property (a stochastic semigroup property) fails to hold because the HMM is only an approximation. As a result, adding extra genetic positions can change the results of the imputation. We emphasise that this is possible only when there are number of underlying sequences which are almost equally likely, and even then this problem occurs rarely. However, this problem becomes obvious when large simulation studies are performed.


mpMap2 documentation built on Sept. 13, 2020, 5:17 p.m.