View source: R/rhapsodi_autorun.R
rhapsodi_autorun | R Documentation |
This function runs all steps of rhapsodi by first inputting the data
Then calling phase_donor_haplotypes
to run donor phasing
Then calling impute_gamete_genotypes
to run gamete genotype imputation
And finally calling discover_meiotic_recombination
to run meiotic recombination discovery
Input data should be sparse gamete genotype data encoded either as 0/1/NA or as a VCF style input with A/C/G/T/NA. The data is either from a tab-delimited file with a header or a pre-loaded data frame/table.
For both input types, the first column should contain SNP positions in integer format.
For ACGT input type, specifically (acgt = TRUE), the second column should be the REF allele and the third column should be the ALT allele. All following columns should be gamete data, with each gamete having its own column. Within these columns, data should be A/C/G/T/NA
For 0/1/NA input type, specifically (acgt = FALSE), gamete data starts in the second column and continues for the rest of the columns.
After rhapsodi has completed all three tasks, it returns a named list which has
donor_haps
which is the phased haplotypes as a data frame with column names index, pos (for SNP positions), h1 (haplotype 1), & h2 (haplotype 2) if acgt = FALSE. Otherwise: index, pos, a0, a1, h1, h2
gamete_haps
which is the filled gamete data frame specifying from which donor haplotype each gamete position originates. Column names: index, pos, gamete_names.
gamete_genotypes
which is the filled gamete dataf rame specifying the genotype (in 0's and 1's) for each gamete position. If acgt = FALSE, column names: index, pos, gamete_names. Otherwise: index, pos, a0, a1, gamete_names
unsmoothed_gamete_haps
which is the filled gamete data frame specifying from which donor haplotype each gamete position originates in data frame form, after unsmoothing the data by replacing imputed values with original sequencing reads when there's disagreement between observations and imputation. Column names: index, pos, gamete_names.
unsmoothed_gamete_genotypes
which is the filled gamete data frame specifying the genotype (in 0's and 1's) for each gamete position, after unsmoothing the the data by replacing imputed values with original sequencing reads when there's disagreement between observations and imputation. If acgt = FALSE, column names: index, pos, gamete_names. Otherwise: index, pos, a0, a1, gamete_names
recomb_breaks
which is a data frame specifying the recombination breakpoints for each gamete. Column names: Ident, Genomic_start, Genomic_end
rhapsodi_autorun( input_file, use_dt = FALSE, input_dt = NULL, acgt = FALSE, threads = 2, sampleName = "sampleT", chrom = "chrT", seqError_model = 0.005, avg_recomb_model = 1, window_length = 3000, overlap_denom = 2, calculate_window_size_bool = FALSE, estimated_coverage = NULL, mcstop = TRUE, stringent_stitch = TRUE, stitch_new_min = 0.5, smooth_imputed_genotypes = FALSE, fill_ends = TRUE, smooth_crossovers = TRUE, verbose = FALSE )
input_file |
a string; the path plus filename for the input sparse gamete genotype data in tabular form. Note the form is different depending on the value of |
use_dt |
a bool; default is FALSE, whether to input a pre-loaded data frame/table rather than using an input file |
input_dt |
a data frame/table; only necessary if use_dt is TRUE. User-pre-loaded data frame/table. Note the format is different depending on the value of |
acgt |
a bool; default is FALSE; If TRUE, assumes that the data is not 0/1/NA encoded, rather gamete genotypes are A/C/G/T/NA encoded and the dataframe has ref and alt columns. |
threads |
an integer; default is 2, number of threads to utilize when we use |
sampleName |
a string; default is "sampleT", fill in with whatever the sample name is. We assume a single input file is from a single sample/donor |
chrom |
a string; default is "chrT", fill in with whatever the chromosome is. We assume a single input file is from a single chromosome |
seqError_model |
a numeric; default is 0.005, used in |
avg_recomb_model |
a numeric; default is 1, used in |
window_length |
an integer; default is 3000, used in |
overlap_denom |
an integer; default is 2, used in |
calculate_window_size_bool |
A bool; used in |
estimated_coverage |
a numeric; used in |
mcstop |
a bool; used in |
stringent_stitch |
a bool; used in |
stitch_new_min |
a numeric >0, but <1; default is 0.5; used in |
smooth_imputed_genotypes |
a bool; default is FALSE; used in |
fill_ends |
a boolean; if TRUE, fills the NAs at the terminal edges of chromosomes with the last known or imputed SNP (for end of chromosome) and the first known or imputed SNP (for beginning of chromosome); if FALSE, leaves these genotypes as NA; default = TRUE |
smooth_crossovers |
a bool; default is TRUE; used in |
verbose |
a bool; default is FALSE; if TRUE, prints progress statements after each step is successfully completed |
rhapsodi_out a named list with donor_haps
, gamete_haps
, gamete_genotypes
, unsmoothed_gamete_haps
, unsmoothed_gamete_genotypes
, and recomb_breaks
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.