viterbi: Calculate most probable sequence of genotypes

View source: R/viterbi.R

viterbiR Documentation

Calculate most probable sequence of genotypes

Description

Uses a hidden Markov model to calculate arg max Pr(g | O) where g is the underlying sequence of true genotypes and O is the observed multipoint marker data, with possible allowance for genotyping errors.

Usage

viterbi(
  cross,
  map = NULL,
  error_prob = 0.0001,
  map_function = c("haldane", "kosambi", "c-f", "morgan"),
  lowmem = FALSE,
  quiet = TRUE,
  cores = 1
)

Arguments

cross

Object of class "cross2". For details, see the R/qtl2 developer guide.

map

Genetic map of markers. May include pseudomarker locations (that is, locations that are not within the marker genotype data). If NULL, the genetic map in cross is used.

error_prob

Assumed genotyping error probability

map_function

Character string indicating the map function to use to convert genetic distances to recombination fractions.

lowmem

If FALSE, split individuals into groups with common sex and crossinfo and then precalculate the transition matrices for a chromosome; potentially a lot faster but using more memory.

quiet

If FALSE, print progress messages.

cores

Number of CPU cores to use, for parallel calculations. (If 0, use parallel::detectCores().) Alternatively, this can be links to a set of cluster sockets, as produced by parallel::makeCluster().

Details

We use a hidden Markov model to find, for each individual on each chromosome, the most probable sequence of underlying genotypes given the observed marker data.

Note that we break ties at random, and our method for doing this may introduce some bias.

Consider the results with caution; the most probable sequence can have very low probability, and can have features that are quite unusual (for example, the number of recombination events can be too small). In most cases, the results of a single imputation with sim_geno() will be more realistic.

Value

An object of class "viterbi": a list of two-dimensional arrays of imputed genotypes, individuals x positions. Also contains three attributes:

  • crosstype - The cross type of the input cross.

  • is_x_chr - Logical vector indicating whether chromosomes are to be treated as the X chromosome or not, from input cross.

  • alleles - Vector of allele codes, from input cross.

See Also

sim_geno(), maxmarg(), cbind.viterbi(), rbind.viterbi()

Examples

grav2 <- read_cross2(system.file("extdata", "grav2.zip", package="qtl2"))
map_w_pmar <- insert_pseudomarkers(grav2$gmap, step=1)
g <- viterbi(grav2, map_w_pmar, error_prob=0.002)

qtl2 documentation built on April 22, 2023, 1:10 a.m.