hogwash: Run bacterial GWAS

Description Usage Arguments Details Author(s) References Examples

View source: R/hogwash.R

Description

This function runs a bacterial genome-wide association test. It runs either the Continuous Test when given continuous phenotype data. When given binary data the user may run either the Synchronous Test or PhyC or both.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
hogwash(
  pheno,
  geno,
  tree,
  tree_type = "phylogram",
  file_name = "hogwash",
  dir = ".",
  perm = 10000,
  fdr = 0.15,
  bootstrap = 0.7,
  group_genotype_key = NULL,
  grouping_method = "post-ar",
  test = "both"
)

Arguments

pheno

Matrix. Dimensions: nrow = number of samples, ncol = 1. Either continuous or binary (0/1). Row.names() must match tree$tip.label. Required input.

geno

Matrix. Dimensions: nrow = number of samples, ncol = number of genotypes. Binary (0/1). Row.names() must match tree$tip.label. Required input.

tree

Phylo object. If unrooted, will be rooted using phytools::midpoint.root() method. Required input.

tree_type

Characer. Default = "phylogram". User can supply either: "phylogram" or "fan." Determines how the trees are plotted in the output.

file_name

Character. Suffix for output files. Default value "hogwash".

dir

Character. Path to output directory. Default value is current directory: "."

perm

Integer. Number of permutations to run. Default value is: 10,000.

fdr

Numeric. False discovery rate. Between 0 and 1. Default value is: 0.15.

bootstrap

Numeric. Confidence threshold for tree bootstrap values. Default value is: 0.70.

group_genotype_key

Matrix. Dimenions: nrow = number of unique genotypes, ncol = 2. Optional input.

grouping_method

Character. Either "pre-ar" or "post-ar". Default = "post_ar". Determines which grouping method is used if and only if a group_genotype_key is provided; if no key is provided this argument is ignored.

test

Character. Default = "both". User can supply three options: "both", "phyc", or "synchronous". Determines which test is run for binary data.

Details

Overview: hogwash reads in one phenotype (either continuous or binary), a matrix of binary genotypes, and a phylogenetic tree. Given these inputs it performs an ancestral reconstruction of that phenotype and each genotype. The ancestral reconstructions are used to perform one of several tests to associate the the genotypes with the phenotype:

  1. Continuous Test

  2. Synchronous Test

  3. PhyC Test (Farhat et al.)

Once a test finishes running it returns (i) p-values for all genotypes tested, (ii) a manhattan plot of those p-values; if any of the genotypes tested were significant associated with the phenotype after FDR correction it also returns (iii) a list of significant hits and (iv) figures of the genotype & phenotype reconstructions on the tree.

Grouping: A feature of hogwash is the ability to organize genotypes into biologically meaningful groups. Testing for an association between an individual SNP and a phenotype is quite stringent, but patterns may emerge when grouping together biologically related genotypes. For example, grouping together all variants (insertions, deletions and SNPs) within a gene or promoter region could allow the user to identify a particular gene as being associated with a phenotype while any individual variant within that gene may not have deep penetrance in the isolates being tested. Grouped genotypes could have increased power to identify convergent evolution because they captures larger trends in functional impact at the group level and reduce the multiple testing correction burden. Use cases for this method could be to group SNPs into genes, kmers or genes into pathways, etc... Each of the three tests can be run on disaggregated data or aggregated data with the inclusion of a grouping key. There are two grouping options: grouping prior to ancestral reconstruction or grouping post ancestral reconstruction.

Author(s)

Katie Saund

References

Farhat MR, Shapiro BJ, Kieser KJ, et al. Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat Genet. 2013;45(10):1183–1189. doi:10.1038/ng.2747

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Both Synchronous Test & PhyC for discrete phenotype
phenotype <- hogwash::antibiotic_resistance
genotype <- hogwash::snp_genotype
tree <- hogwash::tree
hogwash(pheno = phenotype,
        geno = genotype,
        tree = tree)

# Continuous Test for continuous phenotype
phenotype <- hogwash::growth
genotype <- hogwash::snp_genotype
tree <- hogwash::tree
hogwash(pheno = phenotype,
        geno = genotype,
        tree = tree)

# Continuous Test while grouping SNPs into genes
phenotype <- hogwash::growth
genotype <- hogwash::snp_genotype
tree <- hogwash::tree
key <- hogwash::snp_gene_key
hogwash(pheno = phenotype,
        geno = genotype,
        tree = tree,
        group_genotype_key = key,
        grouping_method = "post-ar")

# Both Synchronous Test & PhyC while grouping SNPs into genes
phenotype <- hogwash::antibiotic_resistance
genotype <- hogwash::snp_genotype
tree <- hogwash::tree
key <- hogwash::snp_gene_key
hogwash(pheno = phenotype,
        geno = genotype,
        tree = tree,
        group_genotype_key = key,
        grouping_method = "post-ar")

katiesaund/hogwash documentation built on Jan. 18, 2022, 7:41 a.m.