rhapsodi_autorun: This function can be used to run all steps of rhapsodi

View source: R/rhapsodi_autorun.R

rhapsodi_autorunR Documentation

This function can be used to run all steps of rhapsodi

Description

This function runs all steps of rhapsodi by first inputting the data Then calling phase_donor_haplotypes to run donor phasing Then calling impute_gamete_genotypes to run gamete genotype imputation And finally calling discover_meiotic_recombination to run meiotic recombination discovery Input data should be sparse gamete genotype data encoded either as 0/1/NA or as a VCF style input with A/C/G/T/NA. The data is either from a tab-delimited file with a header or a pre-loaded data frame/table. For both input types, the first column should contain SNP positions in integer format. For ACGT input type, specifically (acgt = TRUE), the second column should be the REF allele and the third column should be the ALT allele. All following columns should be gamete data, with each gamete having its own column. Within these columns, data should be A/C/G/T/NA For 0/1/NA input type, specifically (acgt = FALSE), gamete data starts in the second column and continues for the rest of the columns. After rhapsodi has completed all three tasks, it returns a named list which has donor_haps which is the phased haplotypes as a data frame with column names index, pos (for SNP positions), h1 (haplotype 1), & h2 (haplotype 2) if acgt = FALSE. Otherwise: index, pos, a0, a1, h1, h2 gamete_haps which is the filled gamete data frame specifying from which donor haplotype each gamete position originates. Column names: index, pos, gamete_names. gamete_genotypes which is the filled gamete dataf rame specifying the genotype (in 0's and 1's) for each gamete position. If acgt = FALSE, column names: index, pos, gamete_names. Otherwise: index, pos, a0, a1, gamete_names unsmoothed_gamete_haps which is the filled gamete data frame specifying from which donor haplotype each gamete position originates in data frame form, after unsmoothing the data by replacing imputed values with original sequencing reads when there's disagreement between observations and imputation. Column names: index, pos, gamete_names. unsmoothed_gamete_genotypes which is the filled gamete data frame specifying the genotype (in 0's and 1's) for each gamete position, after unsmoothing the the data by replacing imputed values with original sequencing reads when there's disagreement between observations and imputation. If acgt = FALSE, column names: index, pos, gamete_names. Otherwise: index, pos, a0, a1, gamete_names recomb_breakswhich is a data frame specifying the recombination breakpoints for each gamete. Column names: Ident, Genomic_start, Genomic_end

Usage

rhapsodi_autorun(
  input_file,
  use_dt = FALSE,
  input_dt = NULL,
  acgt = FALSE,
  threads = 2,
  sampleName = "sampleT",
  chrom = "chrT",
  seqError_model = 0.005,
  avg_recomb_model = 1,
  window_length = 3000,
  overlap_denom = 2,
  calculate_window_size_bool = FALSE,
  estimated_coverage = NULL,
  mcstop = TRUE,
  stringent_stitch = TRUE,
  stitch_new_min = 0.5,
  smooth_imputed_genotypes = FALSE,
  fill_ends = TRUE,
  smooth_crossovers = TRUE,
  verbose = FALSE
)

Arguments

input_file

a string; the path plus filename for the input sparse gamete genotype data in tabular form. Note the form is different depending on the value of acgt. Use NULL if use_dt is TRUE

use_dt

a bool; default is FALSE, whether to input a pre-loaded data frame/table rather than using an input file

input_dt

a data frame/table; only necessary if use_dt is TRUE. User-pre-loaded data frame/table. Note the format is different depending on the value of acgt

acgt

a bool; default is FALSE; If TRUE, assumes that the data is not 0/1/NA encoded, rather gamete genotypes are A/C/G/T/NA encoded and the dataframe has ref and alt columns.

threads

an integer; default is 2, number of threads to utilize when we use mclapplylike functions

sampleName

a string; default is "sampleT", fill in with whatever the sample name is. We assume a single input file is from a single sample/donor

chrom

a string; default is "chrT", fill in with whatever the chromosome is. We assume a single input file is from a single chromosome

seqError_model

a numeric; default is 0.005, used in build_hmm within impute_gamete_genotypes, the expected error rate in genotyping

avg_recomb_model

a numeric; default is 1, used in build_hmm within impute_gamete_genotypes, the expected number of average recombination events per chromosome

window_length

an integer; default is 3000, used in split_with_overlap within phase_donor_haplotypes, the segment length to use in constructing overlapping windows for phasing

overlap_denom

an integer; default is 2, used in split_with_overlap within phase_donor_haplotypes, User-input value for denominator in calculation of overlap, or the degree of overlap between segments

calculate_window_size_bool

A bool; used in phase_donor_haplotypes, whether or not to calculate the window size based on characteristics of the input dataset; default = FALSE

estimated_coverage

a numeric; used in calculate_window_size within phase_donor_haplotypes only if the user wants rhapsodi to calculate the preferred window size for accurate phasing given characteristics of the data; the estimated sequencing depth of coverage of the input data; default = NULL

mcstop

a bool; used in stitch_haplotypes within phase_donor_haplotypes, only considered if stringent_stitch is TRUE; default is TRUE; this parameter is used to determine whether phasing continues or exits if the mean concordance between two windows is between 0.1 and 0.9. If TRUE, rhapsodi exits. If FALSE, rhapsodi and phasing continues, asking which threshold the concordance is closer to and acting accordingly

stringent_stitch

a bool; used in stitch_haplotypes within phase_donor_haplotypes, default is TRUE, this parameter is used to determine the threshold values used in determining whether two windows originate from the same donor. If TRUE, the preset thresholds of 0.1 and 0.9 are used.

stitch_new_min

a numeric >0, but <1; default is 0.5; used in stitch_haplotypes within phase_donor_haplotypes, this parameter is only evaluated if stringent_stitch is FALSE and is dually assigned as the different_max and same_min threshold values when considering the concordance between two windows and therefore which donors they originate from (same or different).

smooth_imputed_genotypes

a bool; default is FALSE; used in impute_gamete_genotypes whether to use smoothed data from the HMM or original reads for the ending filled gamete genotypes, whenever there is disagreement between the two. If TRUE, doesn't replace smoothed data from HMM with original reads when there's a mismatch

fill_ends

a boolean; if TRUE, fills the NAs at the terminal edges of chromosomes with the last known or imputed SNP (for end of chromosome) and the first known or imputed SNP (for beginning of chromosome); if FALSE, leaves these genotypes as NA; default = TRUE

smooth_crossovers

a bool; default is TRUE; used in discover_meiotic_recombination whether to use smoothed data from the HMM or original reads for recombination finding. If TRUE, doesn't replace smoothed data from HMM with original reads when there's a mismatch

verbose

a bool; default is FALSE; if TRUE, prints progress statements after each step is successfully completed

Value

rhapsodi_out a named list with donor_haps, gamete_haps, gamete_genotypes, unsmoothed_gamete_haps, unsmoothed_gamete_genotypes, and recomb_breaks


mccoy-lab/rhapsodi documentation built on July 27, 2022, 3:56 a.m.