filter_and_optimize.RegressHaplo: Generate consistent haplotypes for a read table and, if...

View source: R/RegressHaplo_util.R

filter_and_optimize.RegressHaploR Documentation

Generate consistent haplotypes for a read table and, if desired, apply RegressHaplo optimization.

Description

Generates consistent haplotypes by filtering local haplotypes using the RegressHaplo algorithm to satisfy dimensions requirements, and if desired then applies the RegressHaplo algorithm globally.

Usage

filter_and_optimize.RegressHaplo(
  df,
  global_rho = NULL,
  max_global_dim = 10000,
  max_local_dim = 1200,
  min_cover = 500,
  run_optimization = F
)

Arguments

df

read table

global_rho

If a global fit should be computed, the rho that should be used.

max_local_dim

The maximum number of haplotypes that can be filtered

min_cover

The minimum read coverage needed to link across a read table position

run_optimization

Should an optimization be run, or should just consistent global haplotypes be returned.

max_gobal_dim

The maximum number of consistent haplotypes that should be generated

Details

Haplotypes are generated by splitting the read table positions into loci and then iteratively filtering the local haplotypes using the RegressHaplo algorithm until all combinations of local haplotypes have dimension less than max_global_dim. At the outset, loci are defined as positions linked by reads, but if a locus has too many consistent haplotypes (> max_local_dim), then a locus is split in half until the dimension is reduced. This allows application of the RegressHaplo algorithm locally.

To run an optimization, run_optimization must be TRUE and a global_rho must be provided.

Value

A list constaining the elements df, pi, fit, and h. df is simply the read_table returned. h are the global consistent haplotypes generated after filtering; h is a character matrix with colnames giving positions. pi and fit are NA if the optimization is not run, otherwhise pi is a vector of frequencies with length equal to the number of haplotypes (nrow(h)) and fit is a scalar describing the fit of the solution.


SLeviyang/RegressHaplo documentation built on June 1, 2022, 10:48 p.m.