determine_snp_dist: SNP distribution between sites
In surh/HMVAR: Human Microbiome Variant Analysis in R

Description Usage Arguments Details Value Examples

Determines how snps distribute between sites. Requires output from midas_merge.py and a mapping file mapping samples to sites.

determine_snp_dist(
  info,
  freq,
  depth,
  map,
  depth_thres = 1,
  freq_thres = 0.5,
  clean = TRUE
)

`info`	Data table corresponding to the 'snps_info.txt' file from MIDAS. Must have columns 'site_id' and 'sample'
`freq`	A data table corresponding to the 'snps_freq.txt' file from MIDAS. Must have a 'site_id' column, and one more column per sample. Each row is the frequency of the minor allele for the corresponding site in the corresponding sample.
`depth`	A data table corresponding to the 'snps_depth.txt' file from MIDAS. Must have a 'site_id' column, and one more column per sample. Each row is the sequencing depth for the corresponding site in the corresponding sample.
`map`	A data table associating samples with groups (sites). must have columns 'sample' and 'Group'.
`depth_thres`	Minimum number of reads (depth) at a site at a sample to be considered.
`freq_thres`	Frequency cuttoff for minor vs major allele. The value represents the distance from 0 or 1, for a site to be assigned to the major or minor allele respectively. It must be a value in [0,1].
`clean`	Whether to remove sites that had no valid distribution.

Only samples in both the map and the depth and freq tables are considered. Everything else is removed (inner_join)

A data table which is the same and info bnut with a 'distribution' column indicating the allele distribution between sites in the given samples.

library(HMVAR)

# Get file paths
midas_dir <- system.file("toy_example/merged.snps/", package = "HMVAR")
map <- readr::read_tsv(system.file("toy_example/map.txt", package = "HMVAR"),
                       col_types = readr::cols(.default = readr::col_character())) %>%
  dplyr::select(sample = ID, Group)

# Read data
midas_data <- read_midas_data(midas_dir = midas_dir, map = map, cds_only = TRUE)

info <- determine_snp_effect(midas_data$info) %>%
  determine_snp_dist(freq = midas_data$freq,
                     depth = midas_data$depth, map = map,
                     depth_thres = 1, freq_thres = 0.5)
info

mktable <- info %>%
  split(.$gene_id) %>%
  purrr::map_dfr(mkvalues,
                 .id = "gene_id")
mktable