#' @title Computing Type 1 Error for Single Cell Expression Case-Control Differential
#' Expression Analysis
#'
#' @name compute_error
#'
#' @description Computes type 1 error for single cell data that is cell-type specifc,
#' hierarchical, and compositonal. This function computes type 1 error with the
#' single-cell differential expression analysis tool 'MAST', using random
#' effects to account for the correlation structure that exists among measures
#' from cells within an individual. The type 1 error calculations will borrow
#' information from the input data (or the package default data) to simulate
#' data under a variety of pre-determined conditions. These conditions include
#' foldchange, number of genes, number of samples (i.e., independent
#' experimental units), and the mean number of cells per individual.
#'
#' @details Prior to running the \code{\link{error_hierarchicell}} function, it
#' is important to run the \code{\link{filter_counts}} function followed by
#' the \code{\link{compute_data_summaries}} function to build an R object that
#' is in the right format for the following simulation function to properly
#' work.
#'
#' @note Data should be \strong{only for cells of the specific cell-type} you
#' are interested in simulating or computing type 1 error for. Data should also
#' contain as many unique sample identifiers as possible. If you are inputing
#' data that has less than 5 unique values for sample identifier (i.e.,
#' independent experimental units), then the empirical estimation of the
#' inter-individual heterogeneity is going to be very unstable. Finding such a
#' dataset will be difficult at this time, but, over time (as experiments grow
#' in sample size and the numbers of publically available single-cell RNAseq
#' datasets increase), this should improve dramatically.
#'
NULL
#'@title Compute Type 1 Error for Single Cell Expression Case-Control Analysis
#'
#'@rdname error_hierarchicell
#'
#'@description Computes type 1 error for single cell data that is cell-type
#' specifc, hierarchical, and compositonal. This function computes type 1 error
#' with the single-cell differential expression analysis tool 'MAST', using
#' random effects to account for the correlation structure that exists among
#' measures from cells within an individual. The type 1 error calculations will
#' borrow information from the input data (or the package default data) to
#' simulate data under a variety of pre-determined conditions. These conditions
#' include foldchange, number of genes, number of samples (i.e., independent
#' experimental units), and the mean number of cells per individual.
#'
#'@details Prior to running the \code{\link{error_hierarchicell}} function, it
#' is important to run the \code{\link{filter_counts}} function followed by the
#' \code{\link{compute_data_summaries}} function to build an R object that is
#' in the right format for the following simulation function to properly work.
#'
#'@note Data should be \strong{only for cells of the specific cell-type} you are
#' interested in simulating or computing type 1 error for. Data should also
#' contain as many unique sample identifiers as possible. If you are inputing
#' data that has less than 5 unique values for sample identifier (i.e.,
#' independent experimental units), then the empirical estimation of the
#' inter-individual heterogeneity is going to be very unstable. Finding such a
#' dataset will be difficult at this time, but, over time (as experiments grow
#' in sample size and the numbers of publically available single-cell RNAseq
#' datasets increase), this should improve dramatically.
#'
#'@param data_summaries an R object that has been output by the package's
#' compute_data_summaries function. No default
#'
#'@param method a name. The method for differential expression to be used for
#' the computation of error. Possible methods include: MAST with random effects
#' ("MAST_RE"), MAST ("MAST"), MAST with batch effect correction
#' ("MAST_Combat"), GLM assuming a tweedie distribution ("GLM_tweedie"), GLMM
#' assuming a tweedie distribution ("GLMM_tweedie"), generalized estimating
#' equations ("GEE1"), ROTS ("ROTS"), or Monocle ("Monocle"). Defaults to
#' "MAST_RE" which is the currently recommended analysis pipeline for
#' single-cell data. See \code{\link{de_methods}} for more details on each of
#' the methods.
#'
#'@param n_genes an integer. The number of genes you would like to simulate for
#' your dataset. Too large of a number may cause memory failure and may slow
#' the simulation down tremendously. We recommend an integer less than 40,000.
#' Defaults to 10,000.
#'
#'@param n_per_group an integer. The number of independent samples per
#' case/control group for simulation. Creates a balanced design, for unbalanced
#' designs, specify n_cases and n_controls separately. If not specifying a
#' foldchange, the number of cases and controls does not matter. Defaults to 3.
#'
#'@param n_cases an integer. The number of independent control samples for
#' simulation. Defaults to n_per_group.
#'
#'@param n_controls an integer. The number of independent case samples for
#' simulation. Defaults to n_per_group.
#'
#'@param cells_per_control an integer. The mean number of cells per
#' control you would like to simulate. Too large of a number may cause
#' memory failure and may slow the simulation down tremendously. We recommend
#' an integer less than 300, but more is possible. We note that anything
#' greater than 100, brings marginal improvements in type 1 error. Defaults to
#' 100.
#'
#'@param cells_per_case an integer. The mean number of cells per
#' case you would like to simulate. Too large of a number may cause
#' memory failure and may slow the simulation down tremendously. We recommend
#' an integer less than 300, but more is possible. We note that anything
#' greater than 100, brings marginal improvements in type 1 error. Defaults to
#' 100.
#'
#'
#'@param ncells_variation_type either "Poisson", "NB", or "Fixed". Allows the
#' number of cells per individual to be fixed at exactly the specified number
#' of cells per individual, vary slightly with a poisson distribution with a
#' lambda equal to the specified number of cells per individual, or a negative
#' binomial with a mean equal to the specified number of cells and dispersion
#' size equal to one.Defaults to "Poisson".
#'
#'
#'@param pval a number. The significance threshold (alpha) to use for
#' significance. Defaults to 0.05. Can also be a vector of pvalue - up to a
#' length of 5.
#'
#'@param alter_dropout_cases a numeric proportion between 0 and 1. The
#' proportion by which you would like to simulate decreasing the amount of
#' dropout between case control groups. For example, if you would like to
#' simulate a decrease in the amount of dropout in your cases by twenty
#' percent, then 0.2 would be appropriate. This component of the simulation
#' allows the user to adjust the proportion of dropout if they believe the
#' stochastic expression of a gene will differ between cases and controls. For
#' a two-part hurdle model, like MAST implements, this will increase your
#' ability to detect differences. Defaults to 0.
#'
#'@param decrease_dropout a numeric proportion between 0 and 1. The proportion
#' by which you would like to simulate decreasing the amount of dropout in your
#' data. For example, if you would like to simulate a decrease in the amount of
#' dropout in your data by twenty percent, then 0.2 would be appropriate. This
#' component of the simulation allows the user to adjust the proportion of
#' dropout if they believe future experiments or runs will have improved
#' calling rates (due to improved methods or improved cell viability) and
#' thereby lower dropout rates. Defaults to 0.
#'
#'@param foldchange a number between 1 and 10. The amount of fold change to
#' simulate a difference in expression between case and control groups. The
#' foldchange changes genes in either direction, so a foldchange of 2 would
#' cause the mean expression in cases to be either twice the amount or half the
#' amount for any particular gene. Defaults to 1.
#'
#'@return The estimated error under the specified conditions when using 'MAST'
#' with random effects to account for the correlation structure that exists
#' among measures from cells within an individual.
#'
#'@examples
#'\donttest{clean_expr_data <- filter_counts()
#'data_summaries <- compute_data_summaries(clean_expr_data)
#'error_hierarchicell(data_summaries,
#' n_genes = 100,
#' n_per_group = 4,
#' cells_per_case = 50,
#' cells_per_control = 50)}
#'
#'@export
error_hierarchicell <- function(data_summaries,
method = "MAST_RE",
n_genes = 10000,
n_per_group = 3,
n_cases = n_per_group,
n_controls = n_per_group,
cells_per_case = 100,
cells_per_control = 100,
ncells_variation_type = "Poisson",
pval = 0.05,
foldchange = 1,
decrease_dropout = 0,
alter_dropout_cases = 0){
if (method == "MAST_RE") {
if (!requireNamespace(c("MAST","SummarizedExperiment","lme4"),quietly = TRUE)){
stop("The packages 'MAST', 'lme4', 'fitdistrplus', and \n
'SummarizedExperiment' are required. Please install them.\n
It may be a problem with dependencies for these packages,\n
type '!requireNamespace(\"MAST\")' to see if this is the issue.",
call. = FALSE)
} else {
if (foldchange != 1){
message("----------------------------------------------")
message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
message("----------------------------------------------")
}
if (cells_per_control < 50 | cells_per_case < 50){
message("----------------------------------------------")
message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
message("----------------------------------------------")
}
all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
n_genes = n_genes,
n_per_group = n_per_group,
n_cases = n_cases,
n_controls = n_controls,
cells_per_case = cells_per_case,
cells_per_control = cells_per_control,
ncells_variation_type = ncells_variation_type,
foldchange = foldchange,
decrease_dropout = decrease_dropout,
alter_dropout_cases = alter_dropout_cases))
genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
coldata <- all_genes[,1:3]
coldata$Status <- as.factor(coldata$Status)
genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ]
genecounts <- genecounts[,rownames(coldata)]
log2counts <- log2(genecounts + 1)
fData <- data.frame(primerid=rownames(genecounts))
sca <- suppressMessages(MAST::FromMatrix(exprsArray=log2counts, cData=coldata, fData=fData))
cdr2 <- colSums(SummarizedExperiment::assay(sca)>0)
SummarizedExperiment::colData(sca)$ngeneson <- scale(cdr2)
SummarizedExperiment::colData(sca)$Status <-
factor(SummarizedExperiment::colData(sca)$Status)
SummarizedExperiment::colData(sca)$DonorID <-
factor(SummarizedExperiment::colData(sca)$DonorID)
zlmCond <- suppressMessages(MAST::zlm(~ ngeneson + Status + (1 | DonorID),
sca, method='glmer',ebayes = F,
strictConvergence = FALSE))
summaryCond <- suppressMessages(MAST::summary(zlmCond,
doLRT='StatusControl'))
summaryDt <- summaryCond$datatable
fcHurdle <- merge(summaryDt[summaryDt$contrast=='StatusControl'
& summaryDt$component=='logFC', c(1,7,5,6,8)],
summaryDt[summaryDt$contrast=='StatusControl'
& summaryDt$component=='H', c(1,4)],
by = 'primerid')
fcHurdle <- stats::na.omit(as.data.frame(fcHurdle))
if (length(pval) == 1){
signif <- ifelse(fcHurdle[,6] < pval, 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval," is: ", rate))
} else if (length(pval) == 2) {
signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
} else if (length(pval) == 3) {
signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
} else if (length(pval) == 4) {
signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
} else if (length(pval) == 5) {
signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[5], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[5]," is: ", rate))
} else {
message("Too many pvalues, shorten vector of pvalues to 5 or less")
}
}
} else if (method == "MAST") {
if (!requireNamespace(c("MAST","SummarizedExperiment"),quietly = TRUE)){
stop("The packages 'MAST', 'fitdistrplus', and \n
'SummarizedExperiment' are required. Please install them.\n
It may be a problem with dependencies for these packages,\n
type '!requireNamespace(\"MAST\")' to see if this is the issue.",
call. = FALSE)
} else {
if (foldchange != 1){
message("----------------------------------------------")
message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
message("----------------------------------------------")
}
if (cells_per_control < 50 | cells_per_case < 50){
message("----------------------------------------------")
message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
message("----------------------------------------------")
}
all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
n_genes = n_genes,
n_per_group = n_per_group,
n_cases = n_cases,
n_controls = n_controls,
cells_per_case = cells_per_case,
cells_per_control = cells_per_control,
ncells_variation_type = ncells_variation_type,
foldchange = foldchange,
decrease_dropout = decrease_dropout,
alter_dropout_cases = alter_dropout_cases))
genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
coldata <- all_genes[,1:3]
coldata$Status <- as.factor(coldata$Status)
genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ]
genecounts <- genecounts[,rownames(coldata)]
log2counts <- log2(genecounts + 1)
fData <- data.frame(primerid=rownames(genecounts))
sca <- suppressMessages(MAST::FromMatrix(exprsArray=log2counts, cData=coldata, fData=fData))
cdr2 <- colSums(SummarizedExperiment::assay(sca)>0)
SummarizedExperiment::colData(sca)$ngeneson <- scale(cdr2)
SummarizedExperiment::colData(sca)$Status <-
factor(SummarizedExperiment::colData(sca)$Status)
SummarizedExperiment::colData(sca)$DonorID <-
factor(SummarizedExperiment::colData(sca)$DonorID)
zlmCond <- suppressMessages(MAST::zlm(~ ngeneson + Status,
sca, method='glm',ebayes = F))
summaryCond <- suppressMessages(MAST::summary(zlmCond,
doLRT='StatusControl'))
summaryDt <- summaryCond$datatable
fcHurdle <- merge(summaryDt[summaryDt$contrast=='StatusControl'
& summaryDt$component=='logFC', c(1,7,5,6,8)],
summaryDt[summaryDt$contrast=='StatusControl'
& summaryDt$component=='H', c(1,4)],
by = 'primerid')
fcHurdle <- stats::na.omit(as.data.frame(fcHurdle))
if (length(pval) == 1){
signif <- ifelse(fcHurdle[,6] < pval, 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval," is: ", rate))
} else if (length(pval) == 2) {
signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
} else if (length(pval) == 3) {
signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
} else if (length(pval) == 4) {
signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
} else if (length(pval) == 5) {
signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[5], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[5]," is: ", rate))
} else {
message("Too many pvalues, shorten vector of pvalues to 5 or less")
}
}
} else if (method == "MAST_Combat") {
if (!requireNamespace(c("MAST","SummarizedExperiment","sva"),quietly = TRUE)){
stop("The packages 'MAST', 'fitdistrplus', \n
'SummarizedExperiment', and 'sva' are required. Please install them.\n
It may be a problem with dependencies for these packages,\n
type '!requireNamespace(\"MAST\")' to see if this is the issue.",
call. = FALSE)
} else {
if (foldchange != 1){
message("----------------------------------------------")
message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
message("----------------------------------------------")
}
if (cells_per_control < 50 | cells_per_case < 50){
message("----------------------------------------------")
message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
message("----------------------------------------------")
}
all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
n_genes = n_genes,
n_per_group = n_per_group,
n_cases = n_cases,
n_controls = n_controls,
cells_per_case = cells_per_case,
cells_per_control = cells_per_control,
ncells_variation_type = ncells_variation_type,
foldchange = foldchange,
decrease_dropout = decrease_dropout,
alter_dropout_cases = alter_dropout_cases))
genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
coldata <- all_genes[,1:3]
coldata$Status <- as.factor(coldata$Status)
genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ]
genecounts <- genecounts[,rownames(coldata)]
log2counts <- log2(genecounts + 1)
log2counts <- sva::ComBat(log2counts,coldata$DonorID)
fData <- data.frame(primerid=rownames(genecounts))
sca <- suppressMessages(MAST::FromMatrix(exprsArray=log2counts, cData=coldata, fData=fData))
cdr2 <- colSums(SummarizedExperiment::assay(sca)>0)
SummarizedExperiment::colData(sca)$ngeneson <- scale(cdr2)
SummarizedExperiment::colData(sca)$Status <-
factor(SummarizedExperiment::colData(sca)$Status)
SummarizedExperiment::colData(sca)$DonorID <-
factor(SummarizedExperiment::colData(sca)$DonorID)
zlmCond <- suppressMessages(MAST::zlm(~ ngeneson + Status,
sca, method='glm',ebayes = F))
summaryCond <- suppressMessages(MAST::summary(zlmCond,
doLRT='StatusControl'))
summaryDt <- summaryCond$datatable
fcHurdle <- merge(summaryDt[summaryDt$contrast=='StatusControl'
& summaryDt$component=='logFC', c(1,7,5,6,8)],
summaryDt[summaryDt$contrast=='StatusControl'
& summaryDt$component=='H', c(1,4)],
by = 'primerid')
fcHurdle <- stats::na.omit(as.data.frame(fcHurdle))
if (length(pval) == 1){
signif <- ifelse(fcHurdle[,6] < pval, 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval," is: ", rate))
} else if (length(pval) == 2) {
signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
} else if (length(pval) == 3) {
signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
} else if (length(pval) == 4) {
signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
} else if (length(pval) == 5) {
signif <- ifelse(fcHurdle[,6] < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
signif <- ifelse(fcHurdle[,6] < pval[5], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[5]," is: ", rate))
} else {
message("Too many pvalues, shorten vector of pvalues to 5 or less")
}
}
} else if (method == "GLM_tweedie") {
if (!requireNamespace(c("glmmTMB"),quietly = TRUE)){
stop("The 'glmmTMB package is required. Please install it.\n
It may be a problem with dependencies for these packages,\n
type '!requireNamespace(\"glmmTMB\")' to see if this is the issue.",
call. = FALSE)
} else {
message("This function is slow and requires a lot of memory.")
if (foldchange != 1){
message("----------------------------------------------")
message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
message("----------------------------------------------")
}
if (cells_per_control < 50 | cells_per_case < 50){
message("----------------------------------------------")
message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
message("----------------------------------------------")
}
all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
n_genes = n_genes,
n_per_group = n_per_group,
n_cases = n_cases,
n_controls = n_controls,
cells_per_case = cells_per_case,
cells_per_control = cells_per_control,
ncells_variation_type = ncells_variation_type,
foldchange = foldchange,
decrease_dropout = decrease_dropout,
alter_dropout_cases = alter_dropout_cases))
genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
coldata <- all_genes[,1:3]
coldata$Status <- as.factor(coldata$Status)
coldata$DonorID <- as.factor(coldata$DonorID)
genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ] + 1
genecounts <- log(genecounts)
genecounts <- t(genecounts[,rownames(coldata)])
allcells <- cbind(coldata,genecounts)
fitfixed <- lapply(4:ncol(allcells),
function(x){glmmTMB::glmmTMB(allcells[,x] ~ Status,
data=allcells,
family=glmmTMB::tweedie,
ziformula= ~0)})
summaries <- lapply(fitfixed, summary)
pvalues <- as.numeric(unlist(lapply(summaries, function(x){stats::coef(x)$cond[2,4]})))
if (length(pval) == 1){
signif <- ifelse(pvalues < pval, 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval," is: ", rate))
} else if (length(pval) == 2) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
} else if (length(pval) == 3) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
} else if (length(pval) == 4) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
} else if (length(pval) == 5) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
signif <- ifelse(pvalues < pval[5], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[5]," is: ", rate))
} else {
message("Too many pvalues, shorten vector of pvalues to 5 or less")
}
}
} else if (method == "GLMM_tweedie") {
if (!requireNamespace(c("glmmTMB"),quietly = TRUE)){
stop("The 'glmmTMB package is required. Please install it.\n
It may be a problem with dependencies for these packages,\n
type '!requireNamespace(\"glmmTMB\")' to see if this is the issue.",
call. = FALSE)
} else {
message("This function is slow and requires a lot of memory.")
if (foldchange != 1){
message("----------------------------------------------")
message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
message("----------------------------------------------")
}
if (cells_per_control < 50 | cells_per_case < 50){
message("----------------------------------------------")
message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
message("----------------------------------------------")
}
all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
n_genes = n_genes,
n_per_group = n_per_group,
n_cases = n_cases,
n_controls = n_controls,
cells_per_case = cells_per_case,
cells_per_control = cells_per_control,
ncells_variation_type = ncells_variation_type,
foldchange = foldchange,
decrease_dropout = decrease_dropout,
alter_dropout_cases = alter_dropout_cases))
genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
coldata <- all_genes[,1:3]
coldata$Status <- as.factor(coldata$Status)
coldata$DonorID <- as.factor(coldata$DonorID)
genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ] + 1
genecounts <- log(genecounts)
genecounts <- t(genecounts[,rownames(coldata)])
allcells <- cbind(coldata,genecounts)
fitmixed <- lapply(4:ncol(allcells),
function(x){glmmTMB::glmmTMB(allcells[,x] ~ Status + (1 | DonorID),
data=allcells,
family=glmmTMB::tweedie,
ziformula= ~0)})
summaries <- lapply(fitmixed, summary)
pvalues <- as.numeric(unlist(lapply(summaries, function(x){stats::coef(x)$cond[2,4]})))
if (length(pval) == 1){
signif <- ifelse(pvalues < pval, 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval," is: ", rate))
} else if (length(pval) == 2) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
} else if (length(pval) == 3) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
} else if (length(pval) == 4) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
} else if (length(pval) == 5) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
signif <- ifelse(pvalues < pval[5], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[5]," is: ", rate))
} else {
message("Too many pvalues, shorten vector of pvalues to 5 or less")
}
}
} else if (method == "GEE1") {
if (!requireNamespace(c("geepack"),quietly = TRUE)){
stop("The 'glmmTMB package is required. Please install it.\n
It may be a problem with dependencies for these packages,\n
type '!requireNamespace(\"geepack\")' to see if this is the issue.",
call. = FALSE)
} else {
message("This function is slow and requires a lot of memory.")
if (foldchange != 1){
message("----------------------------------------------")
message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
message("----------------------------------------------")
}
if (cells_per_control < 50 | cells_per_case < 50){
message("----------------------------------------------")
message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
message("----------------------------------------------")
}
all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
n_genes = n_genes,
n_per_group = n_per_group,
n_cases = n_cases,
n_controls = n_controls,
cells_per_case = cells_per_case,
cells_per_control = cells_per_control,
ncells_variation_type = ncells_variation_type,
foldchange = foldchange,
decrease_dropout = decrease_dropout,
alter_dropout_cases = alter_dropout_cases))
genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
coldata <- all_genes[,1:3]
coldata$Status <- as.factor(coldata$Status)
coldata$DonorID <- as.factor(coldata$DonorID)
genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ] + 1
genecounts <- log(genecounts)
genecounts <- t(genecounts[,rownames(coldata)])
allcells <- cbind(coldata,genecounts)
fitgee <- lapply(4:ncol(allcells),
function(x){geepack::geeglm(allcells[,x] ~ Status,
data=allcells,
family=stats::gaussian(link="identity"),
id = DonorID,
corstr="exchangeable")})
summaries <- lapply(fitgee, summary)
pvalues <- as.numeric(unlist(lapply(summaries, function(x){stats::coef(x)[2,4]})))
if (length(pval) == 1){
signif <- ifelse(pvalues < pval, 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval," is: ", rate))
} else if (length(pval) == 2) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
} else if (length(pval) == 3) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
} else if (length(pval) == 4) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
} else if (length(pval) == 5) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
signif <- ifelse(pvalues < pval[5], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[5]," is: ", rate))
} else {
message("Too many pvalues, shorten vector of pvalues to 5 or less")
}
}
} else if (method == "ROTS") {
if (!requireNamespace(c("ROTS","tidyr"),quietly = TRUE)){
stop("The packages 'ROTS', 'fitdistrplus', and \n
'tidyr' are required. Please install them.\n
It may be a problem with dependencies for these packages,\n
type '!requireNamespace(\"ROTS\")' to see if this is the issue.",
call. = FALSE)
} else {
if (foldchange != 1){
message("----------------------------------------------")
message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
message("----------------------------------------------")
}
if (cells_per_control < 50 | cells_per_case < 50){
message("----------------------------------------------")
message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
message("----------------------------------------------")
}
all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
n_genes = n_genes,
n_per_group = n_per_group,
n_cases = n_cases,
n_controls = n_controls,
cells_per_case = cells_per_case,
cells_per_control = cells_per_control,
ncells_variation_type = ncells_variation_type,
foldchange = foldchange,
decrease_dropout = decrease_dropout,
alter_dropout_cases = alter_dropout_cases))
genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
coldata <- all_genes[,1:3]
coldata$Status <- as.factor(coldata$Status)
genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ]
genecounts <- genecounts[,rownames(coldata)]
coldata$Status <- ifelse(coldata$Status == "Control",0,1)
coldata$DonorID <- as.factor(coldata$DonorID)
genecounts <- genecounts[,rownames(coldata)]
results <- suppressMessages(ROTS::ROTS(data = genecounts, groups = coldata$Status, B = 1000, K = 500))
results <- suppressMessages(ROTS::summary.ROTS(results, num.genes=nrow(genecounts)))
results <- as.data.frame(results)
results <- stats::na.omit(results)
pvalues <- as.numeric(results$pvalue)
if (length(pval) == 1){
signif <- ifelse(pvalues < pval, 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval," is: ", rate))
} else if (length(pval) == 2) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
} else if (length(pval) == 3) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
} else if (length(pval) == 4) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
} else if (length(pval) == 5) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
signif <- ifelse(pvalues < pval[5], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[5]," is: ", rate))
} else {
message("Too many pvalues, shorten vector of pvalues to 5 or less")
}
}
} else if (method == "Monocle") {
if (!requireNamespace(c("monocle","tidyr"),quietly = TRUE)){
stop("The packages 'monocle', 'fitdistrplus', and \n
'tidyr' are required. Please install them.\n
It may be a problem with dependencies for these packages,\n
type '!requireNamespace(\"monocle\")' to see if this is the issue.",
call. = FALSE)
} else {
if (foldchange != 1){
message("----------------------------------------------")
message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
message("----------------------------------------------")
}
if (cells_per_control < 50 | cells_per_case < 50){
message("----------------------------------------------")
message("Mean number of cells per individual is less than 50.\n The probability of complete separation will start to increase.")
message("----------------------------------------------")
}
all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
n_genes = n_genes,
n_per_group = n_per_group,
n_cases = n_cases,
n_controls = n_controls,
cells_per_case = cells_per_case,
cells_per_control = cells_per_control,
ncells_variation_type = ncells_variation_type,
foldchange = foldchange,
decrease_dropout = decrease_dropout,
alter_dropout_cases = alter_dropout_cases))
genecounts <- as.matrix(t(all_genes[,c(-1,-2,-3)]))
coldata <- all_genes[,1:3]
coldata$Status <- as.factor(coldata$Status)
genecounts <- genecounts[which(apply(genecounts, 1, mean) > 5), ]
genecounts <- genecounts[,rownames(coldata)]
coldata$Status <- ifelse(coldata$Status == "Control",0,1)
coldata$DonorID <- as.factor(coldata$DonorID)
genecounts <- genecounts[,rownames(coldata)]
features <- data.frame(rownames(genecounts),"Function")
colnames(features) <- c("gene_short_name","Function")
rownames(features) <- features$gene_short_name
features <- methods::new("AnnotatedDataFrame", data = features)
pheno <- methods::new("AnnotatedDataFrame", data = coldata)
celldat <- monocle::newCellDataSet(genecounts,phenoData = pheno, featureData = features, expressionFamily = VGAM::negbinomial.size())
celldat <- monocle::detectGenes(celldat, min_expr = 0.1)
celldat <- BiocGenerics::estimateSizeFactors(celldat)
celldat <- BiocGenerics::estimateDispersions(celldat)
results <- monocle::differentialGeneTest(celldat, fullModelFormulaStr = "~Status")
results <- stats::na.omit(results)
pvalues <- as.numeric(results$pval)
if (length(pval) == 1){
signif <- ifelse(pvalues < pval, 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval," is: ", rate))
} else if (length(pval) == 2) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
} else if (length(pval) == 3) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
} else if (length(pval) == 4) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
} else if (length(pval) == 5) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
signif <- ifelse(pvalues < pval[5], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[5]," is: ", rate))
} else {
message("Too many pvalues, shorten vector of pvalues to 5 or less")
}
}
} else if (method == "Pseudobulk_mean") {
if (!requireNamespace(c("DESeq2","tidyr"),quietly = TRUE)){
stop("The packages 'DESeq2', 'fitdistrplus', and \n
'tidyr' are required. Please install them.\n
It may be a problem with dependencies for these packages,\n
type '!requireNamespace(\"DESeq2\")' to see if this is the issue.",
call. = FALSE)
} else {
if (foldchange != 1){
message("----------------------------------------------")
message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
message("----------------------------------------------")
}
all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
n_genes = n_genes,
n_per_group = n_per_group,
n_cases = n_cases,
n_controls = n_controls,
cells_per_case = cells_per_case,
cells_per_control = cells_per_control,
ncells_variation_type = ncells_variation_type,
foldchange = foldchange,
decrease_dropout = decrease_dropout,
alter_dropout_cases = alter_dropout_cases))
genecounts <- as.matrix(all_genes[,c(-1,-2,-3)])
genecounts <- genecounts[ ,which(apply(genecounts, 2, mean) > 5)]
genecounts <- cbind(all_genes[,1:2],genecounts)
computemeans <- function(x){tapply(x,genecounts[,2],mean)}
cellmeans <- sapply(genecounts[,c(-1,-2)],computemeans)
rownames(cellmeans) <- paste0(rownames(cellmeans),"_Mean")
coldata <- as.data.frame(cbind(rownames(cellmeans),rownames(cellmeans)))
colnames(coldata) <- c("SampleID","ToSep")
coldata <- tidyr::separate(coldata,ToSep,c("Status", "Donor_Number", "Mean"), sep="_")
rownames(coldata) <- coldata$SampleID
coldata$Status <- as.factor(coldata$Status)
coldata$Status <- stats::relevel(coldata$Status, "Control")
cellmeans <- round(t(cellmeans),0)
cellmeans <- cellmeans[, rownames(coldata)]
dsd <- suppressMessages(DESeq2::DESeqDataSetFromMatrix(countData = cellmeans, colData = coldata, design = ~ Status))
normFactors <- matrix(rep(1,(nrow(dsd)*ncol(dsd))),ncol=ncol(dsd),nrow=nrow(dsd),dimnames=list(1:nrow(dsd),1:ncol(dsd)))
normFactors <- normFactors / exp(rowMeans(log(normFactors)))
DESeq2::normalizationFactors(dsd) <- normFactors
dsd <- suppressMessages(DESeq2::DESeq(dsd))
res <- as.data.frame(DESeq2::results(dsd))
pvalues <- as.numeric(res$pvalue)
if (length(pval) == 1){
signif <- ifelse(pvalues < pval, 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval," is: ", rate))
} else if (length(pval) == 2) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
} else if (length(pval) == 3) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
} else if (length(pval) == 4) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
} else if (length(pval) == 5) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
signif <- ifelse(pvalues < pval[5], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[5]," is: ", rate))
} else {
message("Too many pvalues, shorten vector of pvalues to 5 or less")
}
}
}else if (method == "Pseudobulk_sum") {
if (!requireNamespace(c("DESeq2","tidyr"),quietly = TRUE)){
stop("The packages 'DESeq2', 'fitdistrplus', and \n
'tidyr' are required. Please install them.\n
It may be a problem with dependencies for these packages,\n
type '!requireNamespace(\"DESeq2\")' to see if this is the issue.",
call. = FALSE)
} else {
if (foldchange != 1){
message("----------------------------------------------")
message("Foldchange is not equal to 1, you are not simulating under the null.\nFor type 1 error rates, please keep foldchange equal to 1")
message("----------------------------------------------")
}
all_genes <- suppressMessages(simulate_hierarchicell(data_summaries,
n_genes = n_genes,
n_per_group = n_per_group,
n_cases = n_cases,
n_controls = n_controls,
cells_per_case = cells_per_case,
cells_per_control = cells_per_control,
ncells_variation_type = ncells_variation_type,
foldchange = foldchange,
decrease_dropout = decrease_dropout,
alter_dropout_cases = alter_dropout_cases))
genecounts <- as.matrix(all_genes[,c(-1,-2,-3)])
genecounts <- genecounts[ ,which(apply(genecounts, 2, mean) > 5)]
genecounts <- cbind(all_genes[,1:2],genecounts)
computesums <- function(x){tapply(x,genecounts[,2],sum)}
cellsums <- sapply(genecounts[,c(-1,-2)],computesums)
rownames(cellsums) <- paste0(rownames(cellsums),"_sum")
coldata <- as.data.frame(cbind(rownames(cellsums),rownames(cellsums)))
colnames(coldata) <- c("SampleID","ToSep")
coldata <- tidyr::separate(coldata,ToSep,c("Status", "Donor_Number", "sum"), sep="_")
rownames(coldata) <- coldata$SampleID
coldata$Status <- as.factor(coldata$Status)
coldata$Status <- stats::relevel(coldata$Status, "Control")
cellsums <- round(t(cellsums),0)
cellsums <- cellsums[, rownames(coldata)]
dsd <- suppressMessages(DESeq2::DESeqDataSetFromMatrix(countData = cellsums, colData = coldata, design = ~ Status))
normFactors <- matrix(rep(1,(nrow(dsd)*ncol(dsd))),ncol=ncol(dsd),nrow=nrow(dsd),dimnames=list(1:nrow(dsd),1:ncol(dsd)))
normFactors <- normFactors / exp(rowMeans(log(normFactors)))
DESeq2::normalizationFactors(dsd) <- normFactors
dsd <- suppressMessages(DESeq2::DESeq(dsd))
res <- as.data.frame(DESeq2::results(dsd))
pvalues <- as.numeric(res$pvalue)
if (length(pval) == 1){
signif <- ifelse(pvalues < pval, 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval," is: ", rate))
} else if (length(pval) == 2) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
} else if (length(pval) == 3) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
} else if (length(pval) == 4) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
} else if (length(pval) == 5) {
signif <- ifelse(pvalues < pval[1], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[1]," is: ", rate))
signif <- ifelse(pvalues < pval[2], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[2]," is: ", rate))
signif <- ifelse(pvalues < pval[3], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[3]," is: ", rate))
signif <- ifelse(pvalues < pval[4], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[4]," is: ", rate))
signif <- ifelse(pvalues < pval[5], 1, 0)
rate <- mean(signif)
message(paste0("Type 1 error for ",pval[5]," is: ", rate))
} else {
message("Too many pvalues, shorten vector of pvalues to 5 or less")
}
}
}
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.