stitch_haplotypes: A function to stitch overlapping windows together into the...

View source: R/stitch_haplotypes.R

stitch_haplotypesR Documentation

A function to stitch overlapping windows together into the two donor haplotypes

Description

This function stitches together the haplotypes of overlapping windows by considering the amount of shared concordance between each window Specific behavior depends on the parameters. There are two main modes by which this function operates. The first is stringent stitching, or the confident phasing mode, which uses two preset threshold values to determine whether or not the windows originate from the same donor The second is non-stringent stitching which uses a single user-defined threshold value to determine whether or not the windows originate from the same donor In both cases, consider there being a different_max threshold, where if the concordance between two windows is less than this value, we would say that the windows originate from different donors and then there being a same_min threshold, where if the concordance between two windows is greater than this value, we would say that the windows originate from the same donor In the first stringent stitching case, these are different values, 0.1 and 0.9 respectively. By default if mean concordance is between these two values, rhapsodi will exit However, the user may set the mcstop argument to FALSE and rhapsodi will continue phasing, asking which threshold the mean concordance is closer to in the 1dimensional sense (the distance between two points on a line segment) and stitching continues, acting accordingly. In the second non-stringent stitching case, these are the same value, recommended and default value of 0.5, but user may change. Theoretically, stringent stitching with mcstop of FALSE is equivalent to non-stringent stitching with a stitch_new_min of 0.5 Mean concordance is found by finding the mean number of locations that have equal values, removing NAs from consideration since majority voting may return NA for some positions After going through all the windows, the function finishes stitching together the whole first haplotype and finds the second by inverting the first haplotype.

Usage

stitch_haplotypes(
  inferred_haplotypes,
  windows,
  mcstop = TRUE,
  stringent_stitch = TRUE,
  stitch_new_min = 0.5
)

Arguments

inferred_haplotypes

a list of tibbles from calling reconstruct_hap within impute_donor_haplotypes where each tibble is the inferred haplotype sorted by position for that window

windows

a list of lists from split_with_overlap within impute_donor_haplotypes where the number of lists equal to the number of windows and each inner list contains the SNP positions in that overlapping segment

mcstop

a bool, only considered if stringent_stitch is TRUE; default is TRUE; this parameter is used to determine whether phasing continues or exits if the mean concordance between two windows is between 0.1 and 0.9. If TRUE, rhapsodi exits. If FALSE, rhapsodi and phasing continues, asking which threshold is closer and acting accordingly

stringent_stitch

a bool, this parameter is used to determine the threshold values used in determining whether two windows originate from the same donor. If TRUE, the preset thresholds of 0.1 and 0.9 are used.

stitch_new_min

a numeric >0, but <1; default is 0.5; this parameter is only evaluated if stringent_stitch is FALSE and is dually assigned as the different_max and same_min threshold values when considering the concordance between two windows and therefore which donors they originate from (same or different).

Value

complete_haplotypes a data frame with two columns, h1 and h2, containing the inferred haplotypes from each window stitched together in a single non-overlapping segment


mccoy-lab/rhapsodi documentation built on July 27, 2022, 3:56 a.m.