impute_gamete_genotypes: A function to drive the assignment and reporting of the...

View source: R/impute_gamete_genotypes.R

impute_gamete_genotypesR Documentation

A function to drive the assignment and reporting of the genotypes of each allele on every gamete

Description

This function builds and applies a hidden Markov model to categorize each allele on each gamete. It then fills the positions missing data with the nearest haplotype assignment. If the user asks for unsmoothed genotypes (i.e. replacing original sequencing reads where HMM imputation disagrees with these original reads) by setting smooth_imputed_genotypes to FALSE then the unsmooth function is called to replace imputed genotypes with original sequencing reads, but both unsmoothed and smoothed are reported.

Usage

impute_gamete_genotypes(
  original_gamete_data,
  complete_haplotypes,
  positions,
  genotyping_error = 0.005,
  avg_recomb = 1,
  smooth_imputed_genotypes = FALSE,
  fill_ends = TRUE,
  threads = 2
)

Arguments

original_gamete_data

original matrix of gametes

complete_haplotypes

Inferred parental haplotypes

positions

vector of SNP position

genotyping_error

User-input for expected error in genotyping (default = 0.005)

avg_recomb

User-input for average recombination rate that can be expected for a chromosome (default=1)

smooth_imputed_genotypes

a bool, default is FALSE, whether to use smoothed data for ending genotypes. If TRUE, doesn't replace with original reads, returning smoothed data only. If FALSE, will return both smoothed and unsmoothed

fill_ends

a boolean; if TRUE, fills the NAs at the terminal edges of chromosomes with the last known or imputed SNP (for end of chromosome) and the first known or imputed SNP (for beginning of chromosome); if FALSE, leaves these genotypes as NA; default = TRUE

threads

User-input value for calling pbmclapply or mclapply (default = 2)

Value

gamete_data a named list with four data frames (names filled_gametes, unsmoothed_gametes, filled_gametes_haps, and unsmoothed_gametes_haps) resulting from the HMM, fill_NAs, and potentially the unsmooth functions, returning the imputed donor 0/1 encoded genotypes (outputs without _haps in the name) and haplotypes (outputs with _haps in the name) for each gamete. In the filled_gametes outputs, the dataframe represents the direct output. In the unsmoothed_gametes outputs, if smooth_imputed_genotypes is TRUE, these are NULL; if FALSE, original sequencing reads replace imputed genotypes if they disagree


mccoy-lab/rhapsodi documentation built on July 27, 2022, 3:56 a.m.