est_rf_hmm_sequential: Multipoint analysis using Hidden Markov Models: Sequential...

View source: R/est_map_hmm.R

est_rf_hmm_sequentialR Documentation

Multipoint analysis using Hidden Markov Models: Sequential phase elimination

Description

Performs the multipoint analysis proposed by Mollinari and Garcia (2019) in a sequence of markers removing unlikely phases using sequential multipoint information.

Usage

est_rf_hmm_sequential(
  input.seq,
  twopt,
  start.set = 4,
  thres.twopt = 5,
  thres.hmm = 50,
  extend.tail = NULL,
  phase.number.limit = 20,
  sub.map.size.diff.limit = Inf,
  info.tail = TRUE,
  reestimate.single.ph.configuration = FALSE,
  tol = 0.1,
  tol.final = 0.001,
  verbose = TRUE,
  detailed.verbose = FALSE,
  high.prec = FALSE
)

Arguments

input.seq

an object of class mappoly.sequence

twopt

an object of class mappoly.twopt containing the two-point information

start.set

number of markers to start the phasing procedure (default = 4)

thres.twopt

the LOD threshold used to determine if the linkage phases compared via two-point analysis should be considered for the search space reduction (A.K.A. η in Mollinari and Garcia (2019), default = 5)

thres.hmm

the LOD threshold used to determine if the linkage phases compared via hmm analysis should be evaluated in the next round of marker inclusion (default = 50)

extend.tail

the length of the chain's tail that should be used to calculate the likelihood of the map. If NULL (default), the function uses all markers positioned. Even if info.tail = TRUE, it uses at least extend.tail as the tail length

phase.number.limit

the maximum number of linkage phases of the sub-maps defined by arguments info.tail and extend.tail. Default is 20. If the size exceeds this limit, the marker will not be inserted. If Inf, then it will insert all markers.

sub.map.size.diff.limit

the maximum accepted length difference between the current and the previous sub-map defined by arguments info.tail and extend.tail. If the size exceeds this limit, the marker will not be inserted. If NULL(default), then it will insert all markers.

info.tail

if TRUE (default), it uses the complete informative tail of the chain (i.e. number of markers where all homologous (ploidy x 2) can be distinguished) to calculate the map likelihood

reestimate.single.ph.configuration

logical. If FALSE (default) returns a map without re-estimating the map parameters in cases where there are only one possible linkage phase configuration

tol

the desired accuracy during the sequential phase (default = 10e-02)

tol.final

the desired accuracy for the final map (default = 10e-04)

verbose

If TRUE (default), current progress is shown; if FALSE, no output is produced

detailed.verbose

If TRUE, the expansion of the current submap is shown;

high.prec

logical. If TRUE uses high precision (long double) numbers in the HMM procedure implemented in C++, which can take a long time to perform (default = FALSE)

Details

This function sequentially includes markers into a map given an ordered sequence. It uses two-point information to eliminate unlikely linkage phase configurations given thres.twopt. The search is made within a window of size extend.tail. For the remaining configurations, the HMM-based likelihood is computed and the ones that pass the HMM threshold (thres.hmm) are eliminated.

Value

A list of class mappoly.map with two elements:

i) info: a list containing information about the map, regardless of the linkage phase configuration:

ploidy

the ploidy level

n.mrk

number of markers

seq.num

a vector containing the (ordered) indices of markers in the map, according to the input file

mrk.names

the names of markers in the map

seq.dose.p1

a vector containing the dosage in parent 1 for all markers in the map

seq.dose.p2

a vector containing the dosage in parent 2 for all markers in the map

chrom

a vector indicating the sequence (usually chromosome) each marker belongs as informed in the input file. If not available, chrom = NULL

genome.pos

physical position (usually in megabase) of the markers into the sequence

seq.ref

reference base used for each marker (i.e. A, T, C, G). If not available, seq.ref = NULL

seq.alt

alternative base used for each marker (i.e. A, T, C, G). If not available, seq.ref = NULL

chisq.pval

a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map

data.name

name of the dataset of class mappoly.data

ph.thres

the LOD threshold used to define the linkage phase configurations to test

ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing

seq.num

a vector containing the (ordered) indices of markers in the map, according to the input file

seq.rf

a vector of size (n.mrk - 1) containing a sequence of recombination fraction between the adjacent markers in the map

seq.ph

linkage phase configuration for all markers in both parents

loglike

the hmm-based multipoint likelihood

Author(s)

Marcelo Mollinari, mmollin@ncsu.edu

References

Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi: 10.1534/g3.119.400378

Examples

 
    mrk.subset <- make_seq_mappoly(hexafake, 1:20)
    red.mrk <- elim_redundant(mrk.subset)
    unique.mrks <- make_seq_mappoly(red.mrk)
    subset.pairs <- est_pairwise_rf(input.seq = unique.mrks,
                                  ncpus = 1,
                                  verbose = TRUE)
    subset.map <- est_rf_hmm_sequential(input.seq = unique.mrks,
                                        thres.twopt = 5,
                                        thres.hmm = 10,
                                        extend.tail = 10,
                                        tol = 0.1,
                                        tol.final = 10e-3,
                                        phase.number.limit = 5,
                                        twopt = subset.pairs,
                                        verbose = TRUE)
     print(subset.map, detailed = TRUE)
     plot(subset.map)
     plot(subset.map, left.lim = 0, right.lim = 1, mrk.names = TRUE)
     plot(subset.map, phase = FALSE)
     
     ## Retrieving simulated linkage phase
     ph.P <- maps.hexafake[[1]]$maps[[1]]$seq.ph$P
     ph.Q <- maps.hexafake[[1]]$maps[[1]]$seq.ph$Q
     ## Estimated linkage phase
     ph.P.est <- subset.map$maps[[1]]$seq.ph$P
     ph.Q.est <- subset.map$maps[[1]]$seq.ph$Q
     compare_haplotypes(ploidy = 6, h1 = ph.P[names(ph.P.est)], h2 = ph.P.est)
     compare_haplotypes(ploidy = 6, h1 = ph.Q[names(ph.Q.est)], h2 = ph.Q.est)
   


mappoly documentation built on Jan. 6, 2023, 1:16 a.m.