R/micro_MiRKAT.R

Defines functions micro_MiRKAT

Documented in micro_MiRKAT

#' @title A function to run MiRKAT within the tidy_micro format
#' @name micro_MiRKAT
#' @description Test for association between microbiome composition and a continuous or dichotomous outcome by incorporating phylogenetic or nonphylogenetic distance between different microbiomes
#' @param micro_set A tidy_micro data set
#' @param outcome The outcome of interset
#' @param ... Covariates you'd like to include. This is optional
#' @param KS Your phylogenetic or nonphylogenetic distance matrices of interest. Can be either a single matrix or a list of matrices
#' @param out_type An indicator of the outcome type. "C" for the continuous outcome and "D" for the dichotomous outcome
#' @param p_method A method to compute the kernel specific p-value (Default= "davies"). "davies" represents an exact method that computes the p-value by inverting the characteristic function of the mixture chisq. We adopt an exact variance component tests because most of the studies concerning microbiome compositions have modest sample size. "moment" represents an approximation method that matches the first two moments. "permutation" represents a permutation approach for p-value calculation
#' @param nperm The number of permutations if method = "permutation" or when multiple kernels are considered. If method = "davies" or "moment", nperm is ignored
#' @details The original MiRKAT function requires numerical matrices or vectors for both your outcome and covariates. This will be done automatically by micro_MiRKAT for your supplied outcome and covariates.
#'
#' Ks should be a list of n by n matrices or a single matrix. If you have distance metric from metagenomic data, each kernel can be constructed through function D2K. Each kernel can also be constructed through other mathematical approaches. Missing data is not permitted. Please remove all individuals with missing y, X, Ks prior to analysis.
#'
#' Parameter "method" only concerns with how kernel specific p-values are generated. When Ks is a list of multiple kernels, omnibus p-value is computed through permutation from each individual p-values, which are calculated through method of choice
#' @return p-value from each candidate kernel and/or omnibus p-value by considering multiple candidate kernels
#' @references \code{\link[MiRKAT]{MiRKAT}}
#' @examples
#' data(phy); data(cla); data(ord); data(fam); data(met)
#'
#' otu_tabs <- list(Phylum = phy, Class = cla, Order = ord, Family = fam)
#' set <- tidy_micro(otu_tabs = otu_tabs, meta = met) \%>\%
#' filter(day == 7) ## Only including first week
#'
#' ## Bray-Curtis beta diversity
#' bray <- set \%>\% beta_div(table = "Family")
#'
#' ## Single dissimilarity matrix
#' set \%>\% micro_MiRKAT(outcome = mom_ethncty_2, bpd1, KS = bray, out_type = "D")
#'
#' ## Morisita-Horn beta diversity
#' horn <- set \%>\% beta_div(table = "Family", method = "horn")
#'
#' ## Omnibus test
#' set \%>\% micro_MiRKAT(outcome = mom_ethncty_2, bpd1, KS = list(bray,horn), out_type = "D")
#' @export
micro_MiRKAT <- function(micro_set, outcome, ..., KS, out_type, p_method = "davies",
                        nperm = 9999, seed = NULL){

  if(out_type %nin% c("C","D")) stop("out_type must be either 'C' or 'D'. See MiRKAT documentation.")

  ## Defining X and outcome for list kernels or single kernels
  if(is.list(KS)){
    out <- micro_set %>%
      dplyr::select(!!enquo(outcome), Lib) %>%
      dplyr::filter(Lib %in% rownames(KS[[1]])) %>%
      dplyr::distinct() %>%
      dplyr::arrange(Lib)

    if(missing(...)){
      X <- NULL
    } else{
      X <- micro_set %>%
        dplyr::select(cov_str(...), Lib) %>%
        dplyr::filter(Lib %in% rownames(KS[[1]])) %>%
        dplyr::distinct() %>%
        dplyr::arrange(Lib)
    }
  } else{
    out <- micro_set %>%
      dplyr::select(!!enquo(outcome), Lib) %>%
      dplyr::filter(Lib %in% rownames(KS)) %>%
      dplyr::distinct() %>%
      dplyr::arrange(Lib)

    if(missing(...)){
      X <- NULL
    } else{
      X <- micro_set %>%
        dplyr::select(cov_str(...), Lib) %>%
        dplyr::filter(Lib %in% rownames(KS)) %>%
        dplyr::distinct() %>%
        dplyr::arrange(Lib)
    }
  }

  ## Formula for covariates
  f <- paste("~", suppressWarnings(adonis_formula(...)))

  ## Getting model matrix out of supplied covariates
  ## MiRKAT requires this kind of input
  if(!is.null(X)){
    X %<>%
      dplyr::select(-Lib) %>%
      model.matrix(as.formula(f), .) %>%
      as.data.frame %>%
      dplyr::select(-1) ## Don't need intercept term
  }

  ## Changing output to 0,1 if out_type is "D" for dichotomous
  if(out_type == "D"){
    out[,1] <- as.factor(out[,1]) ## Ensuring it is a factor
    out <- (out[,1] == levels(out[,1])[1])**2 ## Making 0s and 1s
  } else out %<>% .[,1]

  set.seed(seed)
  ## Applying MiRKAT
  MiRKAT::MiRKAT(y = out, Ks = KS, X = X,
                 out_type = out_type, nperm = nperm, method = p_method)
}
CharlieCarpenter/tidy.micro documentation built on Jan. 19, 2020, 6:28 p.m.