R/04_compute_error.R

Defines functions error_hierarchicell

Documented in error_hierarchicell

#' @title Computing Type 1 Error for Single Cell Expression Case-Control Differential
#'   Expression Analysis
#'
#' @name compute_error
#'
#' @description Computes type 1 error for single cell data that is cell-type specifc,
#'   hierarchical, and compositonal. This function computes type 1 error with the
#'   single-cell differential expression analysis tool 'MAST', using random
#'   effects to account for the correlation structure that exists among measures
#'   from cells within an individual. The type 1 error calculations will borrow
#'   information from the input data (or the package default data) to simulate
#'   data under a variety of pre-determined conditions. These conditions include
#'   foldchange, number of genes, number of samples (i.e., independent
#'   experimental units), and the mean number of cells per individual.
#'
#' @details Prior to running the \code{\link{error_hierarchicell}} function, it
#'   is important to run the \code{\link{filter_counts}} function followed by
#'   the \code{\link{compute_data_summaries}} function to build an R object that
#'   is in the right format for the following simulation function to properly
#'   work.
#'
#' @note Data should be \strong{only for cells of the specific cell-type} you
#'   are interested in simulating or computing type 1 error for. Data should also
#'   contain as many unique sample identifiers as possible. If you are inputing
#'   data that has less than 5 unique values for sample identifier (i.e.,
#'   independent experimental units), then the empirical estimation of the
#'   inter-individual heterogeneity is going to be very unstable. Finding such a
#'   dataset will be difficult at this time, but, over time (as experiments grow
#'   in sample size and the numbers of publically available single-cell RNAseq
#'   datasets increase), this should improve dramatically.
#'

NULL

#'@title Compute Type 1 Error for Single Cell Expression Case-Control Analysis
#'
#'@rdname error_hierarchicell
#'
#'@description Computes type 1 error for single cell data that is cell-type
#'  specifc, hierarchical, and compositonal. This function computes type 1 error
#'  with the single-cell differential expression analysis tool 'MAST', using
#'  random effects to account for the correlation structure that exists among
#'  measures from cells within an individual. The type 1 error calculations will
#'  borrow information from the input data (or the package default data) to
#'  simulate data under a variety of pre-determined conditions. These conditions
#'  include foldchange, number of genes, number of samples (i.e., independent
#'  experimental units), and the mean number of cells per individual.
#'
#'@details Prior to running the \code{\link{error_hierarchicell}} function, it
#'  is important to run the \code{\link{filter_counts}} function followed by the
#'  \code{\link{compute_data_summaries}} function to build an R object that is
#'  in the right format for the following simulation function to properly work.
#'
#'@note Data should be \strong{only for cells of the specific cell-type} you are
#'  interested in simulating or computing type 1 error for. Data should also
#'  contain as many unique sample identifiers as possible. If you are inputing
#'  data that has less than 5 unique values for sample identifier (i.e.,
#'  independent experimental units), then the empirical estimation of the
#'  inter-individual heterogeneity is going to be very unstable. Finding such a
#'  dataset will be difficult at this time, but, over time (as experiments grow
#'  in sample size and the numbers of publically available single-cell RNAseq
#'  datasets increase), this should improve dramatically.
#'
#'@param data_summaries an R object that has been output by the package's
#'  compute_data_summaries function. No default
#'
#'@param method a name. The method for differential expression to be used for
#'  the computation of error. Possible methods include: MAST with random effects
#'  ("MAST_RE"), MAST ("MAST"), MAST with batch effect correction
#'  ("MAST_Combat"), GLM assuming a tweedie distribution ("GLM_tweedie"), GLMM
#'  assuming a tweedie distribution ("GLMM_tweedie"), generalized estimating
#'  equations ("GEE1"), ROTS ("ROTS"), or Monocle ("Monocle"). Defaults to
#'  "MAST_RE" which is the currently recommended analysis pipeline for
#'  single-cell data. See \code{\link{de_methods}} for more details on each of
#'  the methods.
#'
#'@param n_genes an integer. The number of genes you would like to simulate for
#'  your dataset. Too large of a number may cause memory failure and may slow
#'  the simulation down tremendously. We recommend an integer less than 40,000.
#'  Defaults to 10,000.
#'
#'@param n_per_group an integer. The number of independent samples per
#'  case/control group for simulation. Creates a balanced design, for unbalanced
#'  designs, specify n_cases and n_controls separately. If not specifying a
#'  foldchange, the number of cases and controls does not matter. Defaults to 3.
#'
#'@param n_cases an integer. The number of independent control samples for
#'  simulation. Defaults to n_per_group.
#'
#'@param n_controls an integer. The number of independent case samples for
#'  simulation. Defaults to n_per_group.
#'
#'@param cells_per_control an integer. The mean number of cells per
#'  control you would like to simulate. Too large of a number may cause
#'  memory failure and may slow the simulation down tremendously. We recommend
#'  an integer less than 300, but more is possible. We note that anything
#'  greater than 100, brings marginal improvements in type 1 error. Defaults to
#'  100.
#'
#'@param cells_per_case an integer. The mean number of cells per
#'  case you would like to simulate. Too large of a number may cause
#'  memory failure and may slow the simulation down tremendously. We recommend
#'  an integer less than 300, but more is possible. We note that anything
#'  greater than 100, brings marginal improvements in type 1 error. Defaults to
#'  100.
#'
#'
#'@param ncells_variation_type either "Poisson", "NB", or "Fixed". Allows the
#'  number of cells per individual to be fixed at exactly the specified number
#'  of cells per individual, vary slightly with a poisson distribution with a
#'  lambda equal to the specified number of cells per individual, or a negative
#'  binomial with a mean equal to the specified number of cells and dispersion
#'  size equal to one.Defaults to "Poisson".
#'
#'
#'@param pval a number. The significance threshold (alpha) to use for
#'  significance. Defaults to 0.05. Can also be a vector of pvalue - up to a
#'  length of 5.
#'
#'@param alter_dropout_cases a numeric proportion between 0 and 1. The
#'  proportion by which you would like to simulate decreasing the amount of
#'  dropout between case control groups. For example, if you would like to
#'  simulate a decrease in the amount of dropout in your cases by twenty
#'  percent, then 0.2 would be appropriate. This component of the simulation
#'  allows the user to adjust the proportion of dropout if they believe the
#'  stochastic expression of a gene will differ between cases and controls. For
#'  a two-part hurdle model, like MAST implements, this will increase your
#'  ability to detect differences. Defaults to 0.
#'
#'@param decrease_dropout a numeric proportion between 0 and 1. The proportion
#'  by which you would like to simulate decreasing the amount of dropout in your
#'  data. For example, if you would like to simulate a decrease in the amount of
#'  dropout in your data by twenty percent, then 0.2 would be appropriate. This
#'  component of the simulation allows the user to adjust the proportion of
#'  dropout if they believe future experiments or runs will have improved
#'  calling rates (due to improved methods or improved cell viability) and
#'  thereby lower dropout rates. Defaults to 0.
#'
#'@param foldchange a number between 1 and 10. The amount of fold change to
#'  simulate a difference in expression between case and control groups. The
#'  foldchange changes genes in either direction, so a foldchange of 2  would
#'  cause the mean expression in cases to be either twice the amount or half the
#'  amount for any particular gene. Defaults to 1.
#'
#'@return The estimated error under the specified conditions when using 'MAST'
#'  with random effects to account for the correlation structure that exists
#'  among measures from cells within an individual.
#'
#'@examples
#'\donttest{clean_expr_data <- filter_counts()
#'data_summaries <- compute_data_summaries(clean_expr_data)
#'error_hierarchicell(data_summaries,
#'                    n_genes = 100,
#'                    n_per_group = 4,
#'                    cells_per_case = 50,
#'                    cells_per_control = 50)}
#'
#'@export

error_hierarchicell <- function(data_summaries,
                                   method = "MAST_RE",
                                   n_genes = 10000,
                                   n_per_group = 3,
                                   n_cases = n_per_group,
                                   n_controls = n_per_group,
                                   cells_per_case = 100,
                                   cells_per_control = 100,
                                   ncells_variation_type = "Poisson",
                                   pval = 0.05,
                                   foldchange = 1,
                                   decrease_dropout = 0,
                                   alter_dropout_cases = 0){
  if (method == "MAST_RE") {


    if (!requireNamespace(c("MAST","SummarizedExperiment","lme4"),quietly = TRUE)){
      stop("The packages 'MAST', 'lme4', 'fitdistrplus', and \n
           'SummarizedExperiment' are required. Please install them.\n
           It may be a problem with dependencies for these packages,\n
           type '!requireNamespace(\"MAST\")' to see if this is the issue.",
           call. = FALSE)
    } else {

      if (foldchange != 1){
        message("----------------------------------------------")
        message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
        message("----------------------------------------------")
      }


      if (cells_per_control < 50 | cells_per_case < 50){
        message("----------------------------------------------")
        message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
        message("----------------------------------------------")
      }

      all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
                                                           n_genes = n_genes,
                                                           n_per_group = n_per_group,
                                                           n_cases = n_cases,
                                                           n_controls = n_controls,
                                                           cells_per_case = cells_per_case,
                                                           cells_per_control = cells_per_control,
                                                           ncells_variation_type = ncells_variation_type,
                                                           foldchange = foldchange,
                                                           decrease_dropout = decrease_dropout,
                                                           alter_dropout_cases = alter_dropout_cases))

      genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
      coldata <- all_genes[,1:3]
      coldata$Status <- as.factor(coldata$Status)
      genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ]
      genecounts <- genecounts[,rownames(coldata)]
      log2counts <- log2(genecounts + 1)

      fData <- data.frame(primerid=rownames(genecounts))
      sca <- suppressMessages(MAST::FromMatrix(exprsArray=log2counts, cData=coldata, fData=fData))

      cdr2 <- colSums(SummarizedExperiment::assay(sca)>0)
      SummarizedExperiment::colData(sca)$ngeneson <- scale(cdr2)
      SummarizedExperiment::colData(sca)$Status <-
        factor(SummarizedExperiment::colData(sca)$Status)
      SummarizedExperiment::colData(sca)$DonorID <-
        factor(SummarizedExperiment::colData(sca)$DonorID)

      zlmCond <- suppressMessages(MAST::zlm(~ ngeneson + Status + (1 | DonorID),
                                            sca, method='glmer',ebayes = F,
                                            strictConvergence = FALSE))

      summaryCond <- suppressMessages(MAST::summary(zlmCond,
                                                    doLRT='StatusControl'))
      summaryDt <- summaryCond$datatable
      fcHurdle <- merge(summaryDt[summaryDt$contrast=='StatusControl'
                                  & summaryDt$component=='logFC', c(1,7,5,6,8)],
                        summaryDt[summaryDt$contrast=='StatusControl'
                                  & summaryDt$component=='H', c(1,4)],
                        by = 'primerid')

      fcHurdle <- stats::na.omit(as.data.frame(fcHurdle))

      if (length(pval) == 1){

        signif <- ifelse(fcHurdle[,6] < pval, 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval," is: ", rate))

      } else if (length(pval) == 2) {

        signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

      } else if (length(pval) == 3) {

        signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))


      } else if (length(pval) == 4) {

        signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

      } else if (length(pval) == 5) {

        signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[5], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[5]," is: ", rate))

      } else {

        message("Too many pvalues, shorten vector of pvalues to 5 or less")

      }

    }

  } else if (method == "MAST") {
    if (!requireNamespace(c("MAST","SummarizedExperiment"),quietly = TRUE)){
      stop("The packages 'MAST', 'fitdistrplus', and \n
           'SummarizedExperiment' are required. Please install them.\n
           It may be a problem with dependencies for these packages,\n
           type '!requireNamespace(\"MAST\")' to see if this is the issue.",
           call. = FALSE)
    } else {

      if (foldchange != 1){
        message("----------------------------------------------")
        message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
        message("----------------------------------------------")
      }


      if (cells_per_control < 50 | cells_per_case < 50){
        message("----------------------------------------------")
        message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
        message("----------------------------------------------")
      }

      all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
                                                           n_genes = n_genes,
                                                           n_per_group = n_per_group,
                                                           n_cases = n_cases,
                                                           n_controls = n_controls,
                                                           cells_per_case = cells_per_case,
                                                           cells_per_control = cells_per_control,
                                                           ncells_variation_type = ncells_variation_type,
                                                           foldchange = foldchange,
                                                           decrease_dropout = decrease_dropout,
                                                           alter_dropout_cases = alter_dropout_cases))

      genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
      coldata <- all_genes[,1:3]
      coldata$Status <- as.factor(coldata$Status)
      genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ]
      genecounts <- genecounts[,rownames(coldata)]
      log2counts <- log2(genecounts + 1)

      fData <- data.frame(primerid=rownames(genecounts))
      sca <- suppressMessages(MAST::FromMatrix(exprsArray=log2counts, cData=coldata, fData=fData))

      cdr2 <- colSums(SummarizedExperiment::assay(sca)>0)
      SummarizedExperiment::colData(sca)$ngeneson <- scale(cdr2)
      SummarizedExperiment::colData(sca)$Status <-
        factor(SummarizedExperiment::colData(sca)$Status)
      SummarizedExperiment::colData(sca)$DonorID <-
        factor(SummarizedExperiment::colData(sca)$DonorID)

      zlmCond <- suppressMessages(MAST::zlm(~ ngeneson + Status,
                                            sca, method='glm',ebayes = F))

      summaryCond <- suppressMessages(MAST::summary(zlmCond,
                                                    doLRT='StatusControl'))
      summaryDt <- summaryCond$datatable
      fcHurdle <- merge(summaryDt[summaryDt$contrast=='StatusControl'
                                  & summaryDt$component=='logFC', c(1,7,5,6,8)],
                        summaryDt[summaryDt$contrast=='StatusControl'
                                  & summaryDt$component=='H', c(1,4)],
                        by = 'primerid')

      fcHurdle <- stats::na.omit(as.data.frame(fcHurdle))
      if (length(pval) == 1){

        signif <- ifelse(fcHurdle[,6] < pval, 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval," is: ", rate))

      } else if (length(pval) == 2) {

        signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

      } else if (length(pval) == 3) {

        signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))


      } else if (length(pval) == 4) {

        signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

      } else if (length(pval) == 5) {

        signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[5], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[5]," is: ", rate))

      } else {

        message("Too many pvalues, shorten vector of pvalues to 5 or less")

      }

    }


  } else if (method == "MAST_Combat") {
    if (!requireNamespace(c("MAST","SummarizedExperiment","sva"),quietly = TRUE)){
      stop("The packages 'MAST', 'fitdistrplus',  \n
           'SummarizedExperiment', and 'sva' are required. Please install them.\n
           It may be a problem with dependencies for these packages,\n
           type '!requireNamespace(\"MAST\")' to see if this is the issue.",
           call. = FALSE)
    } else {

      if (foldchange != 1){
        message("----------------------------------------------")
        message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
        message("----------------------------------------------")
      }


      if (cells_per_control < 50 | cells_per_case < 50){
        message("----------------------------------------------")
        message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
        message("----------------------------------------------")
      }

      all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
                                                           n_genes = n_genes,
                                                           n_per_group = n_per_group,
                                                           n_cases = n_cases,
                                                           n_controls = n_controls,
                                                           cells_per_case = cells_per_case,
                                                           cells_per_control = cells_per_control,
                                                           ncells_variation_type = ncells_variation_type,
                                                           foldchange = foldchange,
                                                           decrease_dropout = decrease_dropout,
                                                           alter_dropout_cases = alter_dropout_cases))

      genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
      coldata <- all_genes[,1:3]
      coldata$Status <- as.factor(coldata$Status)
      genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ]
      genecounts <- genecounts[,rownames(coldata)]
      log2counts <- log2(genecounts + 1)
      log2counts <- sva::ComBat(log2counts,coldata$DonorID)

      fData <- data.frame(primerid=rownames(genecounts))
      sca <- suppressMessages(MAST::FromMatrix(exprsArray=log2counts, cData=coldata, fData=fData))

      cdr2 <- colSums(SummarizedExperiment::assay(sca)>0)
      SummarizedExperiment::colData(sca)$ngeneson <- scale(cdr2)
      SummarizedExperiment::colData(sca)$Status <-
        factor(SummarizedExperiment::colData(sca)$Status)
      SummarizedExperiment::colData(sca)$DonorID <-
        factor(SummarizedExperiment::colData(sca)$DonorID)

      zlmCond <- suppressMessages(MAST::zlm(~ ngeneson + Status,
                                            sca, method='glm',ebayes = F))

      summaryCond <- suppressMessages(MAST::summary(zlmCond,
                                                    doLRT='StatusControl'))
      summaryDt <- summaryCond$datatable
      fcHurdle <- merge(summaryDt[summaryDt$contrast=='StatusControl'
                                  & summaryDt$component=='logFC', c(1,7,5,6,8)],
                        summaryDt[summaryDt$contrast=='StatusControl'
                                  & summaryDt$component=='H', c(1,4)],
                        by = 'primerid')

      fcHurdle <- stats::na.omit(as.data.frame(fcHurdle))

      if (length(pval) == 1){

        signif <- ifelse(fcHurdle[,6] < pval, 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval," is: ", rate))

      } else if (length(pval) == 2) {

        signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

      } else if (length(pval) == 3) {

        signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))


      } else if (length(pval) == 4) {

        signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

      } else if (length(pval) == 5) {

        signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

        signif <- ifelse(fcHurdle[,6] < pval[5], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[5]," is: ", rate))

      } else {

        message("Too many pvalues, shorten vector of pvalues to 5 or less")

      }

    }


  } else if (method == "GLM_tweedie") {
    if (!requireNamespace(c("glmmTMB"),quietly = TRUE)){
      stop("The 'glmmTMB package is required. Please install it.\n
           It may be a problem with dependencies for these packages,\n
           type '!requireNamespace(\"glmmTMB\")' to see if this is the issue.",
           call. = FALSE)
    } else {

      message("This function is slow and requires a lot of memory.")

      if (foldchange != 1){
        message("----------------------------------------------")
        message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
        message("----------------------------------------------")
      }

      if (cells_per_control < 50 | cells_per_case < 50){
        message("----------------------------------------------")
        message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
        message("----------------------------------------------")
      }

      all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
                                                           n_genes = n_genes,
                                                           n_per_group = n_per_group,
                                                           n_cases = n_cases,
                                                           n_controls = n_controls,
                                                           cells_per_case = cells_per_case,
                                                           cells_per_control = cells_per_control,
                                                           ncells_variation_type = ncells_variation_type,
                                                           foldchange = foldchange,
                                                           decrease_dropout = decrease_dropout,
                                                           alter_dropout_cases = alter_dropout_cases))

      genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
      coldata <- all_genes[,1:3]
      coldata$Status <- as.factor(coldata$Status)
      coldata$DonorID <- as.factor(coldata$DonorID)
      genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ] + 1
      genecounts <- log(genecounts)
      genecounts <- t(genecounts[,rownames(coldata)])
      allcells <- cbind(coldata,genecounts)

      fitfixed <- lapply(4:ncol(allcells),
                         function(x){glmmTMB::glmmTMB(allcells[,x] ~ Status,
                                                      data=allcells,
                                                      family=glmmTMB::tweedie,
                                                      ziformula= ~0)})
      summaries <- lapply(fitfixed, summary)
      pvalues <- as.numeric(unlist(lapply(summaries, function(x){stats::coef(x)$cond[2,4]})))

      if (length(pval) == 1){

        signif <- ifelse(pvalues < pval, 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval," is: ", rate))

      } else if (length(pval) == 2) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

      } else if (length(pval) == 3) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))


      } else if (length(pval) == 4) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

      } else if (length(pval) == 5) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

        signif <- ifelse(pvalues < pval[5], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[5]," is: ", rate))

      } else {

        message("Too many pvalues, shorten vector of pvalues to 5 or less")

      }


    }


  } else if (method == "GLMM_tweedie") {
    if (!requireNamespace(c("glmmTMB"),quietly = TRUE)){
      stop("The 'glmmTMB package is required. Please install it.\n
           It may be a problem with dependencies for these packages,\n
           type '!requireNamespace(\"glmmTMB\")' to see if this is the issue.",
           call. = FALSE)
    } else {

      message("This function is slow and requires a lot of memory.")

      if (foldchange != 1){
        message("----------------------------------------------")
        message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
        message("----------------------------------------------")
      }


      if (cells_per_control < 50 | cells_per_case < 50){
        message("----------------------------------------------")
        message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
        message("----------------------------------------------")
      }

      all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
                                                           n_genes = n_genes,
                                                           n_per_group = n_per_group,
                                                           n_cases = n_cases,
                                                           n_controls = n_controls,
                                                           cells_per_case = cells_per_case,
                                                           cells_per_control = cells_per_control,
                                                           ncells_variation_type = ncells_variation_type,
                                                           foldchange = foldchange,
                                                           decrease_dropout = decrease_dropout,
                                                           alter_dropout_cases = alter_dropout_cases))

      genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
      coldata <- all_genes[,1:3]
      coldata$Status <- as.factor(coldata$Status)
      coldata$DonorID <- as.factor(coldata$DonorID)
      genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ] + 1
      genecounts <- log(genecounts)
      genecounts <- t(genecounts[,rownames(coldata)])
      allcells <- cbind(coldata,genecounts)

      fitmixed <- lapply(4:ncol(allcells),
                         function(x){glmmTMB::glmmTMB(allcells[,x] ~ Status + (1 | DonorID),
                                                      data=allcells,
                                                      family=glmmTMB::tweedie,
                                                      ziformula= ~0)})
      summaries <- lapply(fitmixed, summary)
      pvalues <- as.numeric(unlist(lapply(summaries, function(x){stats::coef(x)$cond[2,4]})))

      if (length(pval) == 1){

        signif <- ifelse(pvalues < pval, 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval," is: ", rate))

      } else if (length(pval) == 2) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

      } else if (length(pval) == 3) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))


      } else if (length(pval) == 4) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

      } else if (length(pval) == 5) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

        signif <- ifelse(pvalues < pval[5], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[5]," is: ", rate))

      } else {

        message("Too many pvalues, shorten vector of pvalues to 5 or less")

      }

    }


  } else if (method == "GEE1") {
    if (!requireNamespace(c("geepack"),quietly = TRUE)){
      stop("The 'glmmTMB package is required. Please install it.\n
           It may be a problem with dependencies for these packages,\n
           type '!requireNamespace(\"geepack\")' to see if this is the issue.",
           call. = FALSE)
    } else {

      message("This function is slow and requires a lot of memory.")

      if (foldchange != 1){
        message("----------------------------------------------")
        message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
        message("----------------------------------------------")
      }


      if (cells_per_control < 50 | cells_per_case < 50){
        message("----------------------------------------------")
        message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
        message("----------------------------------------------")
      }

      all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
                                                           n_genes = n_genes,
                                                           n_per_group = n_per_group,
                                                           n_cases = n_cases,
                                                           n_controls = n_controls,
                                                           cells_per_case = cells_per_case,
                                                           cells_per_control = cells_per_control,
                                                           ncells_variation_type = ncells_variation_type,
                                                           foldchange = foldchange,
                                                           decrease_dropout = decrease_dropout,
                                                           alter_dropout_cases = alter_dropout_cases))

      genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
      coldata <- all_genes[,1:3]
      coldata$Status <- as.factor(coldata$Status)
      coldata$DonorID <- as.factor(coldata$DonorID)
      genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ] + 1
      genecounts <- log(genecounts)
      genecounts <- t(genecounts[,rownames(coldata)])
      allcells <- cbind(coldata,genecounts)

      fitgee <- lapply(4:ncol(allcells),
                         function(x){geepack::geeglm(allcells[,x] ~ Status,
                                                     data=allcells,
                                                     family=stats::gaussian(link="identity"),
                                                     id = DonorID,
                                                     corstr="exchangeable")})
      summaries <- lapply(fitgee, summary)
      pvalues <- as.numeric(unlist(lapply(summaries, function(x){stats::coef(x)[2,4]})))

      if (length(pval) == 1){

        signif <- ifelse(pvalues < pval, 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval," is: ", rate))

      } else if (length(pval) == 2) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

      } else if (length(pval) == 3) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))


      } else if (length(pval) == 4) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

      } else if (length(pval) == 5) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

        signif <- ifelse(pvalues < pval[5], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[5]," is: ", rate))

      } else {

        message("Too many pvalues, shorten vector of pvalues to 5 or less")

      }


    }
  } else if (method == "ROTS") {

    if (!requireNamespace(c("ROTS","tidyr"),quietly = TRUE)){
      stop("The packages 'ROTS', 'fitdistrplus', and \n
           'tidyr' are required. Please install them.\n
           It may be a problem with dependencies for these packages,\n
           type '!requireNamespace(\"ROTS\")' to see if this is the issue.",
           call. = FALSE)
    } else {

      if (foldchange != 1){
        message("----------------------------------------------")
        message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
        message("----------------------------------------------")
      }


      if (cells_per_control < 50 | cells_per_case < 50){
        message("----------------------------------------------")
        message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
        message("----------------------------------------------")
      }

      all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
                                                           n_genes = n_genes,
                                                           n_per_group = n_per_group,
                                                           n_cases = n_cases,
                                                           n_controls = n_controls,
                                                           cells_per_case = cells_per_case,
                                                           cells_per_control = cells_per_control,
                                                           ncells_variation_type = ncells_variation_type,
                                                           foldchange = foldchange,
                                                           decrease_dropout = decrease_dropout,
                                                           alter_dropout_cases = alter_dropout_cases))

      genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
      coldata <- all_genes[,1:3]
      coldata$Status <- as.factor(coldata$Status)
      genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ]
      genecounts <- genecounts[,rownames(coldata)]
      coldata$Status <- ifelse(coldata$Status == "Control",0,1)
      coldata$DonorID <- as.factor(coldata$DonorID)
      genecounts <- genecounts[,rownames(coldata)]
      results <- suppressMessages(ROTS::ROTS(data = genecounts, groups = coldata$Status, B = 1000, K = 500))
      results <- suppressMessages(ROTS::summary.ROTS(results, num.genes=nrow(genecounts)))
      results <- as.data.frame(results)
      results <- stats::na.omit(results)

      pvalues <- as.numeric(results$pvalue)

      if (length(pval) == 1){

        signif <- ifelse(pvalues < pval, 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval," is: ", rate))

      } else if (length(pval) == 2) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

      } else if (length(pval) == 3) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))


      } else if (length(pval) == 4) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

      } else if (length(pval) == 5) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

        signif <- ifelse(pvalues < pval[5], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[5]," is: ", rate))

      } else {

        message("Too many pvalues, shorten vector of pvalues to 5 or less")

      }

      }


  } else if (method == "Monocle") {
    if (!requireNamespace(c("monocle","tidyr"),quietly = TRUE)){
      stop("The packages 'monocle', 'fitdistrplus', and \n
           'tidyr' are required. Please install them.\n
           It may be a problem with dependencies for these packages,\n
           type '!requireNamespace(\"monocle\")' to see if this is the issue.",
           call. = FALSE)
    } else {

      if (foldchange != 1){
        message("----------------------------------------------")
        message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
        message("----------------------------------------------")
      }


      if (cells_per_control < 50 | cells_per_case < 50){
        message("----------------------------------------------")
        message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
        message("----------------------------------------------")
      }

      all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
                                                           n_genes = n_genes,
                                                           n_per_group = n_per_group,
                                                           n_cases = n_cases,
                                                           n_controls = n_controls,
                                                           cells_per_case = cells_per_case,
                                                           cells_per_control = cells_per_control,
                                                           ncells_variation_type = ncells_variation_type,
                                                           foldchange = foldchange,
                                                           decrease_dropout = decrease_dropout,
                                                           alter_dropout_cases = alter_dropout_cases))

      genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
      coldata <- all_genes[,1:3]
      coldata$Status <- as.factor(coldata$Status)
      genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ]
      genecounts <- genecounts[,rownames(coldata)]
      coldata$Status <- ifelse(coldata$Status == "Control",0,1)
      coldata$DonorID <- as.factor(coldata$DonorID)
      genecounts <- genecounts[,rownames(coldata)]

      features <- data.frame(rownames(genecounts),"Function")
      colnames(features) <- c("gene_short_name","Function")
      rownames(features) <- features$gene_short_name
      features <- methods::new("AnnotatedDataFrame", data = features)
      pheno <- methods::new("AnnotatedDataFrame", data = coldata)
      celldat <- monocle::newCellDataSet(genecounts,phenoData = pheno, featureData = features, expressionFamily = VGAM::negbinomial.size())
      celldat <- monocle::detectGenes(celldat, min_expr = 0.1)
      celldat <- BiocGenerics::estimateSizeFactors(celldat)
      celldat <- BiocGenerics::estimateDispersions(celldat)
      results <- monocle::differentialGeneTest(celldat, fullModelFormulaStr = "~Status")
      results <- stats::na.omit(results)
      pvalues <- as.numeric(results$pval)

      if (length(pval) == 1){

        signif <- ifelse(pvalues < pval, 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval," is: ", rate))

      } else if (length(pval) == 2) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

      } else if (length(pval) == 3) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))


      } else if (length(pval) == 4) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

      } else if (length(pval) == 5) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

        signif <- ifelse(pvalues < pval[5], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[5]," is: ", rate))

      } else {

        message("Too many pvalues, shorten vector of pvalues to 5 or less")

      }

      }

  }  else if (method == "Pseudobulk_mean") {
    if (!requireNamespace(c("DESeq2","tidyr"),quietly = TRUE)){
      stop("The packages 'DESeq2', 'fitdistrplus', and \n
           'tidyr' are required. Please install them.\n
           It may be a problem with dependencies for these packages,\n
           type '!requireNamespace(\"DESeq2\")' to see if this is the issue.",
           call. = FALSE)
    } else {

      if (foldchange != 1){
        message("----------------------------------------------")
        message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
        message("----------------------------------------------")
      }


      all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
                                                           n_genes = n_genes,
                                                           n_per_group = n_per_group,
                                                           n_cases = n_cases,
                                                           n_controls = n_controls,
                                                           cells_per_case = cells_per_case,
                                                           cells_per_control = cells_per_control,
                                                           ncells_variation_type = ncells_variation_type,
                                                           foldchange = foldchange,
                                                           decrease_dropout = decrease_dropout,
                                                           alter_dropout_cases = alter_dropout_cases))

      genecounts <- as.matrix(all_genes[,c(-1,-2,-3)])
      genecounts <- genecounts[ ,which(apply(genecounts, 2, mean) > 5)]
      genecounts <- cbind(all_genes[,1:2],genecounts)

      computemeans <- function(x){tapply(x,genecounts[,2],mean)}
      cellmeans <- sapply(genecounts[,c(-1,-2)],computemeans)
      rownames(cellmeans) <- paste0(rownames(cellmeans),"_Mean")
      coldata <- as.data.frame(cbind(rownames(cellmeans),rownames(cellmeans)))
      colnames(coldata) <- c("SampleID","ToSep")
      coldata <- tidyr::separate(coldata,ToSep,c("Status", "Donor_Number", "Mean"), sep="_")
      rownames(coldata) <- coldata$SampleID
      coldata$Status <- as.factor(coldata$Status)
      coldata$Status <- stats::relevel(coldata$Status, "Control")
      cellmeans <- round(t(cellmeans),0)
      cellmeans <- cellmeans[, rownames(coldata)]


      dsd <- suppressMessages(DESeq2::DESeqDataSetFromMatrix(countData = cellmeans, colData = coldata, design = ~ Status))
      normFactors <- matrix(rep(1,(nrow(dsd)*ncol(dsd))),ncol=ncol(dsd),nrow=nrow(dsd),dimnames=list(1:nrow(dsd),1:ncol(dsd)))
      normFactors <- normFactors / exp(rowMeans(log(normFactors)))
      DESeq2::normalizationFactors(dsd) <- normFactors
      dsd <- suppressMessages(DESeq2::DESeq(dsd))
      res <- as.data.frame(DESeq2::results(dsd))
      pvalues <- as.numeric(res$pvalue)

      if (length(pval) == 1){

        signif <- ifelse(pvalues < pval, 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval," is: ", rate))

      } else if (length(pval) == 2) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

      } else if (length(pval) == 3) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))


      } else if (length(pval) == 4) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

      } else if (length(pval) == 5) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

        signif <- ifelse(pvalues < pval[5], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[5]," is: ", rate))

      } else {

        message("Too many pvalues, shorten vector of pvalues to 5 or less")

      }

    }

  }else if (method == "Pseudobulk_sum") {
    if (!requireNamespace(c("DESeq2","tidyr"),quietly = TRUE)){
      stop("The packages 'DESeq2', 'fitdistrplus', and \n
           'tidyr' are required. Please install them.\n
           It may be a problem with dependencies for these packages,\n
           type '!requireNamespace(\"DESeq2\")' to see if this is the issue.",
           call. = FALSE)
    } else {

      if (foldchange != 1){
        message("----------------------------------------------")
        message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
        message("----------------------------------------------")
      }


       all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
                                                           n_genes = n_genes,
                                                           n_per_group = n_per_group,
                                                           n_cases = n_cases,
                                                           n_controls = n_controls,
                                                           cells_per_case = cells_per_case,
                                                           cells_per_control = cells_per_control,
                                                           ncells_variation_type = ncells_variation_type,
                                                           foldchange = foldchange,
                                                           decrease_dropout = decrease_dropout,
                                                           alter_dropout_cases = alter_dropout_cases))

      genecounts <- as.matrix(all_genes[,c(-1,-2,-3)])
      genecounts <- genecounts[ ,which(apply(genecounts, 2, mean) > 5)]
      genecounts <- cbind(all_genes[,1:2],genecounts)

      computesums <- function(x){tapply(x,genecounts[,2],sum)}
      cellsums <- sapply(genecounts[,c(-1,-2)],computesums)
      rownames(cellsums) <- paste0(rownames(cellsums),"_sum")
      coldata <- as.data.frame(cbind(rownames(cellsums),rownames(cellsums)))
      colnames(coldata) <- c("SampleID","ToSep")
      coldata <- tidyr::separate(coldata,ToSep,c("Status", "Donor_Number", "sum"), sep="_")
      rownames(coldata) <- coldata$SampleID
      coldata$Status <- as.factor(coldata$Status)
      coldata$Status <- stats::relevel(coldata$Status, "Control")
      cellsums <- round(t(cellsums),0)
      cellsums <- cellsums[, rownames(coldata)]


      dsd <- suppressMessages(DESeq2::DESeqDataSetFromMatrix(countData = cellsums, colData = coldata, design = ~ Status))
      normFactors <- matrix(rep(1,(nrow(dsd)*ncol(dsd))),ncol=ncol(dsd),nrow=nrow(dsd),dimnames=list(1:nrow(dsd),1:ncol(dsd)))
      normFactors <- normFactors / exp(rowMeans(log(normFactors)))
      DESeq2::normalizationFactors(dsd) <- normFactors
      dsd <- suppressMessages(DESeq2::DESeq(dsd))
      res <- as.data.frame(DESeq2::results(dsd))
      pvalues <- as.numeric(res$pvalue)

      if (length(pval) == 1){

        signif <- ifelse(pvalues < pval, 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval," is: ", rate))

      } else if (length(pval) == 2) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

      } else if (length(pval) == 3) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))


      } else if (length(pval) == 4) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

      } else if (length(pval) == 5) {

        signif <- ifelse(pvalues < pval[1], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[1]," is: ", rate))

        signif <- ifelse(pvalues < pval[2], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[2]," is: ", rate))

        signif <- ifelse(pvalues < pval[3], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[3]," is: ", rate))

        signif <- ifelse(pvalues < pval[4], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[4]," is: ", rate))

        signif <- ifelse(pvalues < pval[5], 1, 0)
        rate <- mean(signif)
        message(paste0("Type 1 error for ",pval[5]," is: ", rate))

      } else {

        message("Too many pvalues, shorten vector of pvalues to 5 or less")

      }

    }

  }

}
kdzimm/hierarchicell documentation built on Dec. 21, 2021, 5:23 a.m.