varsites_pipeline: Varsites pipeline
In surh/HMVAR: Human Microbiome Variant Analysis in R

Description Usage Arguments Value Examples

Analyzes variable sites per sample and produces counts of different types of variants per sample. It also analyzes variable sites within sample to determine if they are homogeneous (fixed within sample) or heterogeneous (variable within sample). It produces several plots

varsites_pipeline(
  freq,
  depth,
  info,
  map,
  depth_thres = 1,
  freq_thres = 0.5,
  plot = TRUE
)

`freq`	A site x sample allele frequency table as a data frame or tibble. It must have a 'site_id' column.
`depth`	A site x sample sequence coverage table as a data frame or tibble. It must have a 'site_id' column.
`info`	A site x variable table. Based on the MIDAS snp_info.txt files. It must have columns 'site_id', 'ref_id', 'ref_pos', 'amino_acids', 'major_allele', and 'minor_allele'.
`map`	A data frame or tibble with sample metadata. It must have a 'sample' column that matches the sample names in 'freq' and 'depth', as well as a 'Group' column with a categorical variable to group the samples.
`depth_thres`	Minimum sequence coverage to to keep a site in a given sample.
`freq_thres`	Frequency trheshold for allele assignemnt per sample. See determine_snp_dist.
`plot`	If TRUE several plot objects will be included in the ouptut.

A list with elements varsites and varsites.pos. Optionally, ggplot2 objects are also included.

library(magrittr)
map <- readr::read_tsv(system.file("toy_example/map.txt",
                                   package = "HMVAR"),
                       col_types = readr::cols(ID = readr::col_character(),
                                               Group = readr::col_character())) %>%
  dplyr::select(sample = ID,
                tidyselect::everything())
Dat <- read_midas_data(midas_dir = system.file("toy_example/merged.snps/",
                                               package = "HMVAR"),
                       map = map,
                       cds_only = FALSE)

Res <- varsites_pipeline(freq = Dat$freq,
                         depth = Dat$depth,
                         info = Dat$info,
                         map = map)