parse_indels: Parse indel variant matrix from Ali's pipeline

Description Usage Arguments Value

Description

Input matrices generated from internal (Ali's) variant calling pipeline. Always returns parsed annotation info. In addition, you have the option to: 1. split rows with multiple annotations (snps in overlapping genes, multiallelic snps) 2. Re-reference to the ancestral allele at that position (instead of to the reference genome) 3. Simplify the code matrix - which contains numbers from -4 to 3 indicating different information about the variants - to a binary matrix indicating simple presence/absence of a SNP at that site.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
parse_indels(
  varmat_code,
  varmat_allele,
  tree = NULL,
  og = NULL,
  remove_multi_annots = FALSE,
  return_binary_matrix = TRUE,
  ref_to_anc = TRUE,
  keep_conf_only = TRUE,
  mat_suffix = "_R1_001.fastq.gz|_R1.fastq.gz|_1.fastq.gz",
  ref_to_maj = FALSE,
  parallelization = "multisession"
)

Arguments

varmat_code

- loaded data.frame or path to the varmat_code file generated from internal variant calling pipeline

varmat_allele

- loaded data.frame or path to the varmat_allele file generated from internal variant calling pipeline

tree

- optional: path to tree file or loaded in tree (class = phylo)

og

- optional: character string of the name of the outgroup (has to match what it is called in the tree)

remove_multi_annots

- logical flag indicating if you want to remove rows with multiple annotations - alternative is to split rows with mutliple annotations (default = FALSE)

return_binary_matrix

- logical flag indicating if you want to return a binary matrix (default = TRUE)

keep_conf_only

- logical flag indicating if only confident variants should be kept (1's in Ali's pipeline, otherwise 3's are also kept) (default = TRUE)

mat_suffix

Suffix to remove from code and allele matrices so the names match with the tree tip labels.

parallelization

Input to future::plan; either "multisession" (default) or "multicore" (always sets to 2 cores aka "workers")

Value

list of allele mat, code mat, binary mat and corresponding parsed annotations. output will depend on arguments to the function.


Snitkin-Lab-Umich/snitkitr documentation built on April 21, 2021, 10:48 a.m.