#' Run bacterial GWAS
#'
#' @description This function runs a bacterial genome-wide association test. It
#' runs either the Continuous Test when given continuous phenotype data. When
#' given binary data the user may run either the Synchronous Test or PhyC or
#' both.
#'
#' @details Overview: hogwash reads in one phenotype (either continuous or
#' binary), a matrix of binary genotypes, and a phylogenetic tree. Given these
#' inputs it performs an ancestral reconstruction of that phenotype and each
#' genotype. The ancestral reconstructions are used to perform one of several
#' tests to associate the the genotypes with the phenotype:
#' \enumerate{
#' \item Continuous Test
#' \item Synchronous Test
#' \item PhyC Test (Farhat et al.)
#' }
#' Once a test finishes running it returns (i) p-values for all genotypes
#' tested, (ii) a manhattan plot of those p-values; if any of the genotypes
#' tested were significant associated with the phenotype after FDR correction
#' it also returns (iii) a list of significant hits and (iv) figures of the
#' genotype & phenotype reconstructions on the tree.
#'
#' Grouping: A feature of hogwash is the ability to organize genotypes into
#' biologically meaningful groups. Testing for an association between an
#' individual SNP and a phenotype is quite stringent, but patterns may emerge
#' when grouping together biologically related genotypes. For example,
#' grouping together all variants (insertions, deletions and SNPs) within a
#' gene or promoter region could allow the user to identify a particular gene
#' as being associated with a phenotype while any individual variant within
#' that gene may not have deep penetrance in the isolates being tested.
#' Grouped genotypes could have increased power to identify convergent
#' evolution because they captures larger trends in functional impact at the
#' group level and reduce the multiple testing correction burden. Use cases
#' for this method could be to group SNPs into genes, kmers or genes into
#' pathways, etc... Each of the three tests can be run on disaggregated data
#' or aggregated data with the inclusion of a grouping key.
#' There are two grouping options: grouping prior to ancestral reconstruction
#' or grouping post ancestral reconstruction.
#'
#' @param pheno Matrix. Dimensions: nrow = number of samples, ncol = 1. Either
#' continuous or binary (0/1). Row.names() must match tree$tip.label. Required
#' input.
#' @param geno Matrix. Dimensions: nrow = number of samples, ncol = number of
#' genotypes. Binary (0/1). Row.names() must match tree$tip.label. Required
#' input.
#' @param tree Phylo object. If unrooted, will be rooted using
#' phytools::midpoint.root() method. Required input.
#' @param tree_type Characer. Default = "phylogram". User can supply either:
#' "phylogram" or "fan." Determines how the trees are plotted in the output.
#' @param file_name Character. Suffix for output files. Default value "hogwash".
#' @param dir Character. Path to output directory. Default value is current
#' directory: "."
#' @param perm Integer. Number of permutations to run. Default value is: 10,000.
#' @param fdr Numeric. False discovery rate. Between 0 and 1. Default value is:
#' 0.15.
#' @param bootstrap Numeric. Confidence threshold for tree bootstrap values.
#' Default value is: 0.70.
#' @param group_genotype_key Matrix. Dimenions: nrow = number of unique
#' genotypes, ncol = 2. Optional input.
#' @param grouping_method Character. Either "pre-ar" or "post-ar". Default =
#' "post_ar". Determines which grouping method is used if and only if a
#' group_genotype_key is provided; if no key is provided this argument is
#' ignored.
#' @param test Character. Default = "both". User can supply three options:
#' "both", "phyc", or "synchronous". Determines which test is run for binary
#' data.
#' @param strain_key Matrix. Default = NULL. nrow = number of samples. ncol = 1.
#' Used to plot colored tips in phylogenetic tree that indicate user defined
#' strain types (or other useful group ID). Samples should be ordered to
#' match tree tips. Stain names should be characters. Colors will be
#' generated programatically rather than supplied by the user.
#'
#' @export
#'
#' @examples
#' \donttest{
#' # Both Synchronous Test & PhyC for discrete phenotype
#' phenotype <- hogwash::antibiotic_resistance
#' genotype <- hogwash::snp_genotype
#' tree <- hogwash::tree
#' hogwash(pheno = phenotype,
#' geno = genotype,
#' tree = tree)
#'
#' # Continuous Test for continuous phenotype
#' phenotype <- hogwash::growth
#' genotype <- hogwash::snp_genotype
#' tree <- hogwash::tree
#' hogwash(pheno = phenotype,
#' geno = genotype,
#' tree = tree)
#'
#' # Continuous Test while grouping SNPs into genes
#' phenotype <- hogwash::growth
#' genotype <- hogwash::snp_genotype
#' tree <- hogwash::tree
#' key <- hogwash::snp_gene_key
#' hogwash(pheno = phenotype,
#' geno = genotype,
#' tree = tree,
#' group_genotype_key = key,
#' grouping_method = "post-ar")
#'
#' # Both Synchronous Test & PhyC while grouping SNPs into genes
#' phenotype <- hogwash::antibiotic_resistance
#' genotype <- hogwash::snp_genotype
#' tree <- hogwash::tree
#' key <- hogwash::snp_gene_key
#' hogwash(pheno = phenotype,
#' geno = genotype,
#' tree = tree,
#' group_genotype_key = key,
#' grouping_method = "post-ar")
#' }
#'
#' @author Katie Saund
#'
#' @references Farhat MR, Shapiro BJ, Kieser KJ, et al. Genomic analysis
#' identifies targets of convergent positive selection in drug-resistant
#' \emph{Mycobacterium tuberculosis}. Nat Genet. 2013;45(10):1183–1189.
#' doi:10.1038/ng.2747
#'
hogwash <- function(pheno,
geno,
tree,
tree_type = "phylogram",
file_name = "hogwash",
dir = ".",
perm = 10000,
fdr = 0.15,
bootstrap = 0.70,
group_genotype_key = NULL,
grouping_method = "post-ar",
test = "both",
strain_key = NULL){
args <- NULL
args$phenotype <- pheno
args$genotype <- geno
args$tree <- tree
args$tree_type <- tree_type
args$output_name <- file_name
args$output_dir <- dir
args$perm <- perm
args$fdr <- fdr
args$bootstrap_cutoff <- bootstrap
args$gene_snp_lookup <- group_genotype_key
args$group_genotype <- FALSE
if (!is.null(group_genotype_key)) {
args$group_genotype <- TRUE
}
args$strain_key <- strain_key
args$grouping_method <- grouping_method
args$test <- test
args$discrete_or_continuous <-
check_input_format(args$phenotype,
args$tree,
args$genotype,
args$output_name,
args$output_dir,
args$perm,
args$fdr,
args$bootstrap_cutoff,
args$gene_snp_lookup,
args$grouping_method,
args$test,
args$tree_type,
args$strain_key)
select_test_type(args)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.