roh: Detection of runs of homozygosity (ROH)

ghap.rohR Documentation

Detection of runs of homozygosity (ROH)

Description

Map haplotype segments that are likely identical-by-descent.

Usage

 ghap.roh(object, minroh = 1e+6, method = "hmm", freq = NULL,
          genpos = NULL, inbcoef = NULL, error = 0.25/100,
          only.active.samples = TRUE, only.active.markers = TRUE,
          ncores = 1, verbose = TRUE)

Arguments

The following arguments are used by both the 'hmm' and 'naive' methods:

object

A valid GHap object (phase or plink).

minroh

Minimum ROH length to output.

method

Character value indicating which method to use: 'naive' or 'hmm' (default).

only.active.samples

A logical value specifying whether only active samples should be included in the search (default = TRUE).

only.active.markers

A logical value specifying whether only active markers should be used in the search (default = TRUE).

ncores

A numeric value specifying the number of cores to be used in parallel computing (default = 1).

verbose

A logical value specfying whether log messages should be printed (default = TRUE).

The following arguments are only used by the 'hmm' method:

freq

Named numeric vector of allele frequencies, such as provided by function ghap.freq.

genpos

Named numeric vector of genetic positions. If not supplied, 1 cM = 1 Mb is assumed and genetic distances between consecutive markers are set to d/1e+6, where d is the distance in base pairs.

inbcoef

Named numeric vector of starting values for genomic inbreeding (i.e., guess for the proportion of the genome covered by ROH).

error

Numeric value representing the expected genotyping error rate (default = 0.25/100).

Details

This function searchs for runs of homozygosity (ROH) via two different methods:

The 'naive' method simply finds streches of homozygous genotypes in the observed haplotypes that are larger then a user-definied minimum size (default is 1 Mbp). The 'hmm' method uses a Hidden Markov Model that takes genotyping error and recombination into account while detecting ROHs. The 'hmm' model in GHap is similar to the ones described by Narasimhan et al (2016) and Druet & Gautier (2017), differing slightly in model fitting and definition of transition and emission probabilities (details are covered in our vignette).

The 'hmm' method requires allele frequencies for each marker, as well as starting values for the expected proportion of the genome covered by ROH (genomic inbreeding) for each individual. Estimates of allele frequencies can be either based on a reference or estimated from the data with the ghap.freq function. Starting values for genomic inbreeding can be obtained by running the function with the 'naive' method first and then computing starting values with ghap.froh (see the examples). A genetic map with positions in cM can be provided by the user via the genpos argument. If genetic positions are not provided, 1 cM = 1 Mb is assumed.

Value

The function returns a dataframe with the following columns:

POP

Original population label.

ID

Individual name.

CHR

Chromosome name.

BP1

Segment start position.

BP2

Segment end position.

LENGTH

Length of run of homozygosity.

Author(s)

Yuri Tani Utsunomiya <ytutsunomiya@gmail.com>

References

V. Narasimhan et al. BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics. 2016. 32:1749-1751.

T. Druet & M. Gautier. A model-based approach to characterize individual inbreeding at both global and local genomic scales. Molecular Ecology. 2017. 26:5820-5841.

See Also

ghap.freq, ghap.froh

Examples


# #### DO NOT RUN IF NOT NECESSARY ###
# 
# # Copy plink data in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
#                          format = "plink",
#                          verbose = TRUE)
# file.copy(from = exfiles, to = "./")
# 
# # Load plink data
# plink <- ghap.loadplink("example")
# 
# ### RUN ###
# 
# # Subset pure1 population
# pure1 <- plink$id[which(plink$pop == "Pure1")]
# plink <- ghap.subset(object = plink, ids = pure1, variants = plink$marker)
# 
# # ROH via the 'naive' method
# roh1 <- ghap.roh(plink, method = "naive")
# froh1 <- ghap.froh(plink, roh1)
# 
# # ROH via the 'hmm' method
# freq <- ghap.freq(plink, type = 'A1')
# inbcoef <- froh1$FROH1; names(inbcoef) <- froh1$ID
# roh2 <- ghap.roh(plink, method = "hmm", freq = freq,
#                 inbcoef = inbcoef)
# froh2 <- ghap.froh(plink, roh2)
#
# # Method 'hmm' using Fhat3 as starting values
# inbcoef <- ibc$Fhat3; names(inbcoef) <- ibc$ID
# inbcoef[which(inbcoef < 0)] <- 0.01
# roh3 <- ghap.roh(plink, method = "hmm", freq = freq,
#                  inbcoef = inbcoef)
# froh3 <- ghap.froh(plink, roh3)


GHap documentation built on July 2, 2022, 1:07 a.m.