R/DEG_marker.R
In rliger: Linked Inference of Genomic Experimental Relationships

Documented in plotMarkerHeatmap plotPairwiseDEGHeatmap runMarkerDEG runPairwiseDEG runWilcoxon

#' @title Find DEG between groups
#' @description Two methods are supported: \code{"pseudoBulk"} and
#' \code{"wilcoxon"}. Pseudo-bulk method aggregates cells basing on biological
#' replicates and calls bulk RNAseq DE methods, DESeq2 wald test, while
#' Wilcoxon rank sum test is performed on single-cell level.
#' \code{runPairwiseDEG()} is generally used for flexibly comparing two specific
#' groups of cells, while \code{runMarkerDEG()} is used for a one-vs-rest marker
#' test strategy.
#'
#' While using pseudo-bulk method, it is generally recommended that you have
#' these variables available in your object:
#'
#' \enumerate{
#'   \item{The cell type or cluster labeling. This can be obtained from prior
#'   study or computed with \code{\link{runCluster}}}
#'   \item{The biological replicate labeling, most of the time the
#'   \code{"dataset"} variable automatically generated when the
#'   \linkS4class{liger} object is created. Users may use other variables if
#'   a "dataset" is merged from multiple replicates.}
#'   \item{The condition labeling that reflects the study design, such as the
#'   treatment or disease status for each sample/dataset.}
#' }
#'
#' Please see below for detailed scenarios.
#'
#' @section Using Wilcoxon rank-sum test:
#' Wilcoxon rank-sum test works for each gene and is based on the rank of the
#' expression in each cell. LIGER provides dataset integration but does not
#' "correct" the expression values. Projects with strong batch effects or
#' integrate drastically different modalities should be cautious when using
#' this method.
#'
#' @section Comparing difference between/across cell types:
#' Most of times, people would want to know what cell types are for each cluster
#' after clustering. This can be done with a marker detection method that test
#' each cluster against all the other cells. This can be done with a command
#' like \code{runMarkerDEG(object, conditionBy = "cluster_var")}. When using
#' default pseudo-bulk method, users should additionaly determine the
#' pseudo-bulk setup parameters. If the real biological replicate variable is
#' available, it should be supplied to argument \code{useReplicate}, otherwise,
#' pseudo-replicates should be created. See "Pseudo-Replicate" section for more.
#'
#' @section Compare between conditions:
#' It is frequently needed to identify the difference between conditions. Users
#' can simply set \code{conditionBy = "condition_var"}. However, most of time,
#' such comparisons should be ideally done in a per-cluster manner. This can be
#' done by setting \code{splitBy = "cluster_var"}. This will run a loop for each
#' cluster, and within the group of cells, compare each condition against all
#' other cells in the cluster.
#'
#' In the scenario when users only need to compare two conditions for each
#' cluster, running \code{runPairwiseDEG(object, groupTest = "condition1",
#' groupCtrl = "condition2", variable1 = "condition_var",
#' splitBy = "cluster_var")} would address the need.
#'
#' For both use case, if pseudo-bulk (default) method is used, users should
#' determine the pseudo-bulk setup parameters as mentioned in the previous
#' section.
#'
#' @section Detailed \code{runMarkerDEG} usage:
#' Marker detection is performed in a one vs. rest manner. The grouping of such
#' condition is specified by \code{conditionBy}, which should be a column name
#' in \code{cellMeta}. When \code{splitBy} is specified as another variable
#' name in \code{cellMeta}, the marker detection will be iteratively done for
#' within each level of \code{splitBy} variable.
#'
#' For example, when \code{conditionBy = "celltype"} and \code{splitBy = NULL},
#' marker detection will be performed by comparing all cells of "celltype_i"
#' against all other cells, and etc. This is analogous to the old version when
#' running \code{runWilcoxon(method = "cluster")}.
#'
#' When \code{conditionBy = "gender"} and \code{splitBy = "leiden_cluster"},
#' marker detection will be performed by comparing "gender_i" cells from "cluster_j"
#' against other cells from "cluster_j", and etc. This is analogous to the old
#' version when running \code{runWilcoxon(method = "dataset")}.
#'
#' @section Detailed \code{runPairwiseDEG} usage:
#' Users can select classes of cells from a variable in \code{cellMeta}.
#' \code{variable1} and \code{variable2} are used to specify a column in
#' \code{cellMeta}, and \code{groupTest} and \code{groupCtrl} are used to specify
#' existing classes from \code{variable1} and \code{variable2}, respectively.
#' When \code{variable2} is missing, \code{groupCtrl} will be considered from
#' \code{variable1}.
#'
#' For example, when \code{variable1 = "celltype"} and \code{variable2 = NULL},
#' \code{groupTest} and \code{groupCtrl} should be valid cell types in
#' \code{object$celltype}.
#'
#' When \code{variable1} is "celltype" and \code{variable2} is "gender",
#' \code{groupTest} should be a valid cell type from \code{object$celltype} and
#' \code{groupCtrl} should be a valid class from \code{object$gender}.
#'
#' When both \code{variable1} and \code{variable2} are missing, \code{groupTest}
#' and \code{groupCtrl} should be valid index of cells in \code{object}.
#'
#' @section Pseudo-Replicate:
#' Pseudo-replicate assignment is a technique to complement the lack of real
#' biological replicates when using pseudo-bulk DE methods. LIGER's pseudo-bulk
#' method generally requires that each comparison group has at least 3
#' replicates each composed of at least 3 cells, in order to ensure the
#' statistic power. When less than 3 real replicates are found for a comparison,
#' the default setting (\code{nPsdRep = NULL}) splits each into 3
#' pseudo-replicates, otherwise no pseudo-replicates are automatically
#' generated. When \code{nPsdRep} is given a number, LIGER will always go
#' through each comparison group and split each real replicate into the given
#' number of pseudo-replicates.
#'
#' @param object A \linkS4class{liger} object, with normalized data available
#' @param groupTest,groupCtrl,variable1,variable2 Condition specification. See
#' \code{?runPairwiseDEG} section \bold{Pairwise DEG Scenarios} for detail.
#' @param splitBy Name(s) of the variable(s) in \code{cellMeta} to split the
#' comparison. See Details. Default \code{NULL}.
#' @param method DEG test method to use. Choose from \code{"pseudoBulk"} or
#' \code{"wilcoxon"}. Default \code{"pseudoBulk"}
#' @param usePeak Logical. Whether to use peak count instead of gene count.
#' Only supported when ATAC datasets are involved. Default \code{FALSE}.
#' @param useReplicate \code{cellMeta} variable of biological replicate
#' annotation. Only used with \code{method = "pseudoBulk"}. Default
#' \code{"dataset"}.
#' @param nPsdRep Number of pseudo-replicates to create. Only used when
#' \code{method = "pseudoBulk"}. Default \code{NULL}. See Details.
#' @param minCellPerRep Numeric, will not make pseudo-bulk for replicate with
#' less than this number of cells. Default \code{3}.
#' @param printDiagnostic Logical. Whether to show more detail when
#' \code{verbose = TRUE}. Default \code{FALSE}.
#' @param chunk Number of features to process at a time during Wilcoxon test.
#' Useful when memory is limited. Default \code{NULL} will process all features
#' at once.
#' @param seed Random seed to use for pseudo-replicate generation. Default
#' \code{1}.
#' @param verbose Logical. Whether to show information of the progress. Default
#' \code{getOption("ligerVerbose")} or \code{TRUE} if users have not set.
#' @return A data.frame with DEG information with the all or some of the
#' following fields:
#'  \item{feature}{Gene names}
#'  \item{group}{Test group name. Multiple tests might be present for each
#'    function call. This is the main variable to distinguish the tests. For a
#'    pairwise test, a row with a certain group name represents the test result
#'    between the this group against the other control group; When split by a
#'    variable, it would be presented in "split.group" format, meaning the stats
#'    is by comparing the group in the split level against the control group in
#'    the same split level. When running marker detection without splitting,
#'    a row with group "a" represents the stats of the gene in group "a" against
#'    all other cells. When running split marker detection, the group name would
#'    be in "split.group" format, meaning the stats is by comparing the group in
#'    the split level against all other cells in the same split level.}
#'  \item{logFC}{Log fold change}
#'  \item{pval}{P-value}
#'  \item{padj}{Adjusted p-value}
#'  \item{avgExpr}{Mean expression in the test group indicated by the "group"
#'    field. Only available for wilcoxon tests.}
#'  \item{statistic}{Wilcoxon rank-sum test statistic. Only available for
#'    wilcoxon tests.}
#'  \item{auc}{Area under the ROC curve. Only available for wilcoxon tests.}
#'  \item{pct_in}{Percentage of cells in the test group, indicated by the
#'    "group" field, that express the feature. Only available for wilcoxon
#'    tests.}
#'  \item{pct_out}{Percentage of cells in the control group or other cells, as
#'    explained for the "group" field, that express the feature. Only available
#'    for wilcoxon tests.}
#' @rdname liger-DEG
#' @export
#' @examples
#' \donttest{
#' pbmc$leiden_cluster <- pbmcPlot$leiden_cluster
#'
#' # Identify cluster markers
#' degStats1 <- runMarkerDEG(pbmc, conditionBy = "leiden_cluster")
#'
#' # Compare "stim" data against "ctrl" data within each cluster
#' degStats3 <- runPairwiseDEG(pbmc, groupTest = "stim", groupCtrl = "ctrl",
#'                             variable1 = "dataset",
#'                             splitBy = "leiden_cluster",
#'                             minCellPerRep = 4)
#' }
runPairwiseDEG <- function(
        object,
        groupTest,
        groupCtrl,
        variable1 = NULL,
        variable2 = NULL,
        splitBy = NULL,
        method = c("pseudoBulk", "wilcoxon"),
        usePeak = FALSE,
        useReplicate = "dataset",
        nPsdRep = NULL,
        minCellPerRep = 3,
        printDiagnostic = FALSE,
        chunk = NULL,
        seed = 1,
        verbose = getOption("ligerVerbose", TRUE)
) {
    method <- match.arg(method)
    if (!is.null(splitBy)) {
        splitVar <- .fetchCellMetaVar(object, splitBy,
                                      checkCategorical = TRUE, drop = TRUE,
                                      droplevels = TRUE)
        splitVar <- interaction(splitVar, drop = TRUE)
        splitGroup <- lapply(levels(splitVar), function(x) {
            which(splitVar == x)
        })
        names(splitGroup) <- levels(splitVar)
    } else {
        splitGroup <- list(seq(ncol(object)))
    }

    if (is.null(variable1) && is.null(variable2)) {
        # Directly using cell index
        groups <- list(
            .idxCheck(object, groupTest, "cell"),
            .idxCheck(object, groupCtrl, "cell")
        )
        group1Name <- "test"
        group2Name <- "control"
        names(groups) <- c("test", "control")
    } else if (!is.null(variable1)) {
        var1 <- .fetchCellMetaVar(object, variable1,
                                  checkCategorical = TRUE, drop = TRUE,
                                  droplevels = TRUE)
        group1Idx <- which(var1 %in% groupTest)
        group1Name <- paste(groupTest, collapse = ".")
        if (is.null(variable2)) {
            variable2 <- variable1
            var2 <- var1
        } else {
            var2 <- .fetchCellMetaVar(object, variable2,
                                      checkCategorical = TRUE, drop = TRUE,
                                      droplevels = TRUE)
        }
        group2Idx <- which(var2 %in% groupCtrl)
        group2Name <- paste(groupCtrl, collapse = ".")
        groups <- list(group1Idx, group2Idx)
        names(groups) <- c(group1Name, group2Name)
    } else {
        cli::cli_abort("Please see {.code ?runPairwiseDEG} for usage.")
    }
    resultList <- list()
    groupsMeta <- list()
    for (i in seq_along(splitGroup)) {
        splitName <- names(splitGroup)[i]
        splitIdx <- splitGroup[[i]]

        if (isTRUE(verbose)) {
            if (length(splitGroup) > 1) {
                cli::cli_alert_info(
                    "Running DEG within: {.val {splitName}}"
                )
            }
        }

        groups.sub <- lapply(groups, function(x) {
            intersect(x, splitIdx)
        })
        names(groups.sub) <- sapply(names(groups), function(x) {
            paste(c(splitName, x), collapse = ".")
        })
        if (length(groups.sub[[1]]) == 0) {
            cli::cli_alert_warning("No cell selected for group {.val {names(groups.sub)[1]}} when split by {.val {splitBy}} in level {.val {splitName}}. Skipping.")
            next
        }
        if (length(groups.sub[[2]]) == 0) {
            cli::cli_alert_warning("No cell selected for group {.val {names(groups.sub)[1]}} when split by {.val {splitBy}} in level {.val {splitName}}. Skipping.")
            next
        }
        groupsMeta[[names(groups.sub)[1]]] <- groups.sub
        result <- .runDEG(object, groups = groups.sub, method = method,
                          usePeak = usePeak, useReplicate = useReplicate,
                          nPsdRep = nPsdRep,
                          minCellPerRep = minCellPerRep, seed = seed,
                          printDiagnostic = printDiagnostic,
                          skipTwoGroup = TRUE, chunk = chunk,
                          verbose = verbose)
        if (nrow(result) > 0) {
            result <- result[result$group == names(groups.sub)[1],]
        }

        # attributes(result)$meta <- list(
        #     groupTest = groupTest,
        #     variable1 = variable1,
        #     groupCtrl = groupCtrl,
        #     variable2 = variable2
        # )
        resultList[[i]] <- result
    }
    resultList <- Reduce(rbind, resultList)
    attr(resultList, "groups") <- groupsMeta
    return(resultList)
}

#' @rdname liger-DEG
#' @export
#' @param conditionBy \code{cellMeta} variable(s). Marker detection will be
#' performed for each level of this variable. Multiple variables will be
#' combined. Default \code{NULL} uses default cluster.
#' @param useDatasets Datasets to perform marker detection within. Default
#' \code{NULL} will use all datasets.
runMarkerDEG <- function(
        object,
        conditionBy = NULL,
        splitBy = NULL, # The previous by dataset strategy
        method = c("pseudoBulk", "wilcoxon"),
        useDatasets = NULL,
        usePeak = FALSE,
        useReplicate = "dataset",
        nPsdRep = NULL,
        minCellPerRep = 3,
        printDiagnostic = FALSE,
        chunk = NULL,
        seed = 1,
        verbose = getOption("ligerVerbose", TRUE)
) {
    useDatasets <- .checkUseDatasets(object, useDatasets)
    allCellIdx <- seq(ncol(object))[object$dataset %in% useDatasets]
    conditionBy <- conditionBy %||% object@uns$defaultCluster
    if (is.null(conditionBy)) {
        cli::cli_abort("No {.var conditionBy} given or default cluster not set.")
    }
    conditionBy <- .fetchCellMetaVar(
        object, conditionBy, cellIdx = allCellIdx,
        checkCategorical = TRUE, drop = FALSE, droplevels = TRUE
    )
    conditionBy <- interaction(conditionBy, drop = TRUE)
    splitBy <- .fetchCellMetaVar(
        object, splitBy, cellIdx = allCellIdx,
        checkCategorical = TRUE, drop = FALSE, droplevels = TRUE
    )
    splitBy <- interaction(splitBy, drop = TRUE)
    groups <- split(allCellIdx, conditionBy)
    if (nlevels(splitBy) <= 1) {
        result <- .runDEG(object, groups = groups, method = method,
                          usePeak = usePeak, useReplicate = useReplicate,
                          nPsdRep = nPsdRep,
                          minCellPerRep = minCellPerRep,
                          printDiagnostic = printDiagnostic,
                          skipTwoGroup = FALSE, chunk = chunk,
                          seed = seed, verbose = verbose)
        attr(result, "groups") <- groups
    } else {
        resultList <- list()
        groupsMeta <- list()
        for (i in seq_along(levels(splitBy))) {
            if (isTRUE(verbose)) {
                cli::cli_alert_info(
                    "Running DEG within: {.val {levels(splitBy)[i]}}"
                )
            }
            subIdx <- which(splitBy == levels(splitBy)[i])
            subGroups <- lapply(groups, intersect, subIdx)
            names(subGroups) <- paste0(levels(splitBy)[i], '.', names(groups))
            groupsMeta <- c(groupsMeta, subGroups)
            resultList[[levels(splitBy)[i]]] <- .runDEG(
                object, groups = subGroups, method = method,
                usePeak = usePeak, useReplicate = useReplicate,
                nPsdRep = nPsdRep,
                minCellPerRep = minCellPerRep,
                printDiagnostic = printDiagnostic, skipTwoGroup = FALSE,
                chunk = chunk, seed = seed, verbose = verbose
            )
        }
        result <- Reduce(rbind, resultList)
        attr(result, "groups") <- groupsMeta
    }

    return(result)
}

#' @rdname liger-DEG
#' @export
#' @param data.use Same as \code{useDatasets}.
#' @param compare.method Choose from \code{"clusters"} (default) or
#' \code{"datasets"}. \code{"clusters"} compares each cluster against all other
#' cells, while \code{"datasets"} run within each cluster and compare each
#' dataset against all other datasets.
runWilcoxon <- function(
        object,
        data.use = NULL,
        compare.method = c("clusters", "datasets")
) {
    lifecycle::deprecate_warn(
        "1.99.0", "runWilcoxon()",
        details = "Please use `runMarkerDEG()` with `method = 'wilcoxon'` instead."
    )
    compare.method <- match.arg(compare.method)
    if (compare.method == "clusters") {
        res <- runMarkerDEG(object, conditionBy = object@uns$defaultCluster,
                            splitBy = NULL, method = "wilcoxon")
    } else if (compare.method == "datasets") {
        res <- runMarkerDEG(object, conditionBy = "dataset",
                            splitBy = object@uns$defaultCluster,
                            method = "wilcoxon")
    }
    return(res)
}

# groups - As underlying function, this must be organized into list of numeric
# cell index vectors.
.runDEG <- function(
        object,
        groups,
        method = c("pseudoBulk", "wilcoxon"),
        usePeak = FALSE,
        useReplicate = "dataset",
        nPsdRep = NULL,
        minCellPerRep = 10,
        printDiagnostic = FALSE,
        skipTwoGroup = TRUE,
        chunk = NULL,
        seed = 1,
        verbose = getOption("ligerVerbose", TRUE)
) {
    method <- match.arg(method)
    allCellIdx <- Reduce(c, groups)
    if (length(allCellIdx) == 0)
        cli::cli_abort(c(x = "No cell selected"))
    allCellBC <- colnames(object)[allCellIdx]
    datasetInvolve <- levels(object$dataset[allCellIdx, drop = TRUE])
    var <- factor(rep(names(groups), lengths(groups)), levels = names(groups))
    if (isTRUE(usePeak)) {
        useDatasets <- .checkUseDatasets(object, useDatasets = datasetInvolve,
                                         modal = "atac")
    } else {
        useDatasets <- .checkUseDatasets(object, useDatasets = datasetInvolve)
    }
    slot <- .DE.checkDataAvail(object, datasetInvolve, method, usePeak)
    dataList <- getMatrix(object, slot, datasetInvolve, returnList = TRUE)
    # mat <- mergeSparseAll(dataList, mode = "intersection")
    features <- Reduce(intersect, lapply(dataList, rownames))
    if (length(features) == 0) {
        cli::cli_abort("No shared feature available from datasets involved for the test.
                       Datasets involved: {.val {datasetInvolve}}")
    }
    featureOrder <- stats::setNames(seq_along(features), features)
    # dataList <- lapply(dataList, function(x) x[features, , drop = FALSE])

    # mat <- Reduce(cbind, dataList)

    # mat <- mat[, allCellBC, drop = FALSE]
    if (method == "wilcoxon") {
        chunk <- chunk %||% length(features)
        nchunk <- ceiling(length(features)/chunk)
        resultList <- list()
        cli::cli_progress_bar("Wilcoxon rank-sum test", total = nchunk, clear = FALSE)
        for (i in seq_len(nchunk)) {
            start <- (i - 1) * chunk + 1
            end <- min(i * chunk, length(features))
            mat <- extractMergedNormData(
                object,
                slot = slot,
                cellIdx = allCellBC,
                featureIdx = features[start:end]
            )
            mat <- log1p(1e10*mat)
            resultList[[i]] <- wilcoxauc(mat, var, verbose = verbose)
            if (nchunk > 1) gc()
            cli::cli_progress_update(set = i)
        }
        result <- Reduce(rbind, resultList)
        result <- result[order(result$group, featureOrder[result$feature]), ]
        if (nchunk > 1) {
            result %<>%
                dplyr::group_by(.data[['group']]) %>%
                dplyr::mutate(padj = stats::p.adjust(.data[['pval']], method = "BH")) %>%
                as.data.frame()
        }

        # result$group <- factor(result$group, levels = levels(var))
    } else if (method == "pseudoBulk") {
        resultList <- list()
        if (isTRUE(verbose)) {
            if (nlevels(var) <= 2) {
                cliID <- cli::cli_process_start("Calling pairwise DESeq2 Wald test")
            } else if (nlevels(var) > 2) {
                if (isFALSE(printDiagnostic)) {
                    cliID_pb <- cli::cli_progress_bar(name = "DESeq2 Wald test",
                                                      total = nlevels(var))
                }
            }
        }

        for (i in seq_along(levels(var))) {
            testName <- levels(var)[i]
            if (isTRUE(verbose) &&
                isTRUE(printDiagnostic) &&
                nlevels(var) > 2) {
                cliID_perLevel <- cli::cli_process_start("Working on {.val {testName}}")
            }
            result <- tryCatch({
                subVar <- factor(ifelse(var == testName, testName, "others"),
                                 levels = c(testName, "others"))
                aux <- calcPctInOut(
                    object,
                    cellIdx = allCellIdx,
                    features = features,
                    groups = subVar
                )
                # `useReplicate` can be a vector of multiple variables
                replicateAnn <- .fetchCellMetaVar(
                    object, useReplicate,
                    cellIdx = allCellIdx,
                    drop = FALSE,
                    checkCategorical = TRUE,
                    droplevels = TRUE
                )
                # Collapse multiple variables into one factor
                replicateAnn <- interaction(replicateAnn, drop = TRUE)

                replicateAnn <- setupRepVar(
                    groups = subVar,
                    existing = replicateAnn,
                    nPsdRep = nPsdRep,
                    seed = seed,
                    verbose = printDiagnostic
                )
                pbs <- makePseudoBulk(
                    object,
                    cellIdx = allCellIdx,
                    features = features,
                    groups = subVar,
                    replicateAnn = replicateAnn,
                    minCellPerRep = minCellPerRep,
                    verbose = printDiagnostic
                )
                pb <- pbs[[1]]
                subVar <- pbs[[2]]


                # resultList[[testName]] <- .callDESeq2(pb, subVar, printDiagnostic)
                result <- .callDESeq2(pb, subVar, printDiagnostic)
                if (length(levels(var)) <= 2) {
                    if (isTRUE(verbose)) cli::cli_process_done(id = cliID)
                } else {
                    if (isTRUE(verbose)) {
                        if (isTRUE(printDiagnostic)) {
                            cli::cli_process_done(id = cliID_perLevel)
                        } else {
                            cli::cli_progress_update(set = i)
                        }
                    }
                }
                cbind(result, aux)
            }, error = function(e) {
                cli::cli_alert_danger(
                    "Error when computing on {.val {testName}}: {e$message}"
                )
                cli::cli_alert_warning(
                    "Empty result returned for this test."
                )
                if (length(levels(var)) <= 2) {
                    if (isTRUE(verbose)) cli::cli_process_failed(id = cliID)
                } else {
                    if (isTRUE(verbose)) {
                        if (isTRUE(printDiagnostic)) {
                            cli::cli_process_failed(id = cliID_perLevel)
                        } else {
                            cli::cli_progress_update(set = i, id = cliID_pb)
                        }
                    }
                }
                return(data.frame(
                    feature = character(0),
                    group = character(0),
                    logFC = numeric(0),
                    pval = numeric(0),
                    padj = numeric(0)
                ))
            })
            resultList[[testName]] <- result
            if (length(levels(var)) <= 2 && isTRUE(skipTwoGroup)) break
        }
        result <- Reduce(rbind, resultList)

    }
    rownames(result) <- NULL
    result$group <- factor(result$group, levels = levels(var)[levels(var) %in% result$group])
    return(result)
}

.DE.checkDataAvail <- function(object, useDatasets, method, usePeak) {
    if (isH5Liger(object, useDatasets)) { # nocov start
        cli::cli_abort(
            c("HDF5 based datasets detected but is not supported. ",
              "i" = "Try {.code object.sub <- downsample(object, useSlot = 'normData')} to create another object with in memory data")
        )
    } # nocov end
    if (method == "wilcoxon") {
        slot <- ifelse(usePeak, "normPeak", "normData")
    } else if (method == "pseudoBulk") {
        if (!requireNamespace("DESeq2", quietly = TRUE)) # nocov start
            cli::cli_abort(
                "Package {.pkg DESeq2} is needed for this function to work.
                Please install it by command:
                {.code BiocManager::install('DESeq2')}"
            ) # nocov end
        slot <- ifelse(usePeak, "rawPeak", "rawData")
    }
    allAvail <- all(sapply(useDatasets, function(d) {
        ld <- dataset(object, d)
        !is.null(methods::slot(ld, slot))
    }))
    if (!allAvail)
        cli::cli_abort(
            c("{.field {slot}} not all available for involved datasets: {.val {useDatasets}}",
              "i" = "{.code method = '{method}'}; {.code usePeak = {usePeak}}")
        )
    return(slot)
}

###################### Pseudo-bulk Method helper ###############################

# This function checks for real replicate availability for each comparison group
# and settings for pseudo-replicates in situations where real replicates are not
# available or too few.
# groups - a factor with two levels defining the comparison groups
# existing - a factor defining real replicates, or NULL if real replicate is not
#            available
# nRep - number of pseudo-replicates to generate. If NULL, check for number of
# existing replicates and prompt warning if too few. Pseudo-replicates will be
# generated for each group and each replicate if `existing` and `nRep` are not
# NULL.
# seed - random seed to use for generating pseudo-replicates
# verbose - whether to show not-too-important diagnostic messages
# Returns a factor with the same length as `groups`, guiding how pseudo-bulk
# should be finally aggregated. Value string is formatted by
# 1. `group.realRep`, when real replicates are available and enough
# 2. `group.psdRep`, when real replicates are not available and peudo-replicates
#    are generated accordingly
# 3. `group.realRep.psdRep`, when real replicates are available but not enough
#    and pseudo-replicates are generated within per group per real-rep
#    accordingly
setupRepVar <- function(
        groups,
        existing,
        nPsdRep = NULL,
        seed = 1,
        verbose = TRUE
) {
    set.seed(seed)
    # Initialize final returned value
    allRepVar <- rep(NA, length(groups))

    for (i in seq_along(levels(groups))) {
        # Need to check whether there are enough real replicates for each group first
        # Otherwise check whether to generate pseudo-replicates
        groupIdx <- groups == levels(groups)[i]
        if (length(existing) == 0) {
            # No real replicates given, must do pseudo-replicates
            nPsdRepGroup <- nPsdRep %||% 3
            repVarSub <- sample(seq_len(sum(groupIdx)) %% nPsdRepGroup) + 1
            repVarSub <- paste0(levels(groups)[i], ".rep", repVarSub)
        } else {
            existingSub <- droplevels(existing[groups == levels(groups)[i]])
            if (nlevels(existingSub) < 3) {
                # Too few real replicates
                if (is.null(nPsdRep)) {
                    if (isTRUE(verbose)) {
                        cli::cli_alert_warning(
                            "Too few real replicates for condition {.val {levels(groups)[i]}}. Creating 3 pseudo-replicates among each real replicate."
                        )
                    }
                    nPsdRepGroup <- 3
                } else {
                    nPsdRepGroup <- nPsdRep
                }
            } else {
                nPsdRepGroup <- nPsdRep %||% 1
            }
            if (nPsdRepGroup > 1) {
                repVarSub <- rep(NA, length(existingSub))
                for (j in seq_along(levels(existingSub))) {
                    existingSubIdx <- existingSub == levels(existingSub)[j]
                    repVarSubSub <- sample(seq_len(sum(existingSubIdx)) %% nPsdRepGroup) + 1
                    repVarSubSub <- sprintf(
                        '%s.%s.rep%s',
                        levels(groups)[i], levels(existingSub)[j], repVarSubSub
                    )
                    repVarSub[existingSubIdx] <- repVarSubSub
                }
            } else {
                # Don't psdrep anyway
                repVarSub <- paste0(levels(groups)[i], ".", as.character(existingSub))
            }
        }
        allRepVar[groupIdx] <- repVarSub
    }
    return(factor(allRepVar))
}

# This function aggregate raw counts data directly from un-merged matrices in
# a liger object, guided by `replicateAnn`. Returns aggregated pseudo-bulk
# matrix together with the comparison group assignment for each pseudo-bulk.
# The key idea of making this quite-complexed function is to avoid merging
# involved matrix into a giant one, but instead only take what we need and
# reduce (add) it into the final pseudo-bulk matrix.
# object - liger object
# cellIdx - cell index being involved for the test
# features - character vector for features to be included, should already be
#            generated in upstream steps.
# groups - factor defining the comparison groups for each cell specified by
#          `cellIdx`
# replicateAnn - factor defining the replicate assignment for each cell
#                specified by `cellIdx`.
# minCellPerRep - minimum number of cells required for each replicate. Replicates
#                 with less cells will be ignored.
# verbose - whether to show not-too-important diagnostic messages
# Returns a list of two elements, the first being a matrix of length(features)
# rows and `nlevels(replicateAnn)` columns. Could be less for ncols due to
# `minCellPerRep` filtering. The second element is the comparison group
# belonging factor for each pseudo-bulk.
makePseudoBulk <- function(
        object,
        cellIdx,
        features,
        groups,
        replicateAnn,
        minCellPerRep,
        verbose
) {
    # First check if each replicate is having a healthy (large enough) size
    repTab <- table(replicateAnn)
    if (isTRUE(verbose)) {
        cli::cli_alert_info("Replicate sizes:")
        print(repTab)
    }

    ignored <- names(repTab)[repTab < minCellPerRep]
    keep <- names(repTab)[repTab >= minCellPerRep]
    if (length(ignored) > 0) {
        cli::cli_alert_warning(
            "Ignoring replicates (size in bracket) with too few cells: {.val {paste0(ignored, ' (', repTab[ignored], ')')}}"
        )
        cli::cli_alert_info(
            "Consider decrease {.field minCellPerRep} to exclude less replicates or/and lower {.field nPsdRep} to generate larger pseudo-replicates."
        )
    }


    # Then check if after removal, each condition group still has enough
    # replicates
    for (gr in levels(groups)) {
        repPerGroup <- levels(droplevels(replicateAnn[groups == gr]))
        repPerGroup <- repPerGroup[repPerGroup %in% keep]
        if (length(repPerGroup) < 3) {
            cli::cli_abort(
                "Too few replicates with more than {.val {minCellPerRep}} cells ({.var minCellPerRep}) for condition {.val {gr}}."
            )
        }
    }
    cellIdx <- cellIdx[replicateAnn %in% keep]
    groups <- groups[replicateAnn %in% keep]
    replicateAnn <- droplevels(replicateAnn[replicateAnn %in% keep])

    datasetInvolved <- as.character(unique(object$dataset[cellIdx]))
    # Initialize the output matrix
    pseudoBulk <- matrix(
        data = 0,
        nrow = length(features),
        ncol = nlevels(replicateAnn),
        dimnames = list(features, levels(replicateAnn))
    )
    # This won't work if `groups` is not super-categorical of `replicateAnn`
    # But this should not happen if no bug in upstream
    groupOut <- data.frame(rep = replicateAnn, groups = groups) %>%
        dplyr::group_by(.data[['rep']], .data[['groups']]) %>%
        dplyr::count() %>%
        `[[`('groups')

    # Go through each dataset, find out the cells involved and aggregate
    # them into the initialized pseudo-bulk following the guidance of
    # `replicateAnn`

    # Meanwhile, calculate auxiliary metrics `pct_in` and `pct_out` during the
    # for loop for each dataset.

    # repAnnExpand - broad cast the repAnn for only involved cells to all cells
    repAnnExpand <- rep(NA, ncol(object))
    repAnnExpand[cellIdx] <- replicateAnn
    repAnnExpand <- factor(repAnnExpand)
    for (i in seq_along(datasetInvolved)) {
        # dn - dataset name
        dn <- datasetInvolved[i]
        raw <- rawData(object, dn)
        repAnnExpandSub <- repAnnExpand[object$dataset == dn]
        updatePseudoBulkRcpp(
            psdBulk = pseudoBulk,
            sparseRaw = raw,
            featureIdx = match(rownames(raw), features) - 1,
            repIdx = as.integer(repAnnExpandSub) - 1
        )
    }
    return(list(pseudoBulk, groupOut))
}

calcPctInOut <- function(
    object,
    cellIdx,
    features,
    groups
) {
    datasetInvolved <- as.character(unique(object$dataset[cellIdx]))
    # Initialize the output matrix
    # `nCellExpr` is a matrix of n features rows and 2 cols. The first column
    # stores number of cells in test group that express each feature, and the
    # second column stores number of cells in control group that express each
    # feature.
    # After counting, we will calculate `pct_in` and `pct_out` for each feature
    # by dividing the first column by total number of cells in test group and
    # the second column by total number of cells in control group.
    nCellExpr <- matrix(
        data = 0,
        nrow = length(features),
        ncol = 2,
        dimnames = list(features, c("pct_in", "pct_out"))
    )
    groupExpand <- rep(NA, ncol(object))
    # When inserting a factor (`group`) in to the NA vector, the factor
    # is automatically converted to 1-based integers.
    groupExpand[cellIdx] <- groups
    groupExpand <- groupExpand - 1

    # Go through each dataset, find out the cells involved and calculate
    # `pct_in` and `pct_out` for each feature
    for (i in seq_along(datasetInvolved)) {
        # dn - dataset name
        dn <- datasetInvolved[i]
        raw <- rawData(object, dn)
        # The `updateNCellExprRcpp` Rcpp function in-place updates `nCellExpr`
        # using each raw matrix so no redundant memory allocation is needed.
        # Information needed:
        # - index that tells which cells in `raw` are for the test group and
        #   which are for the control group
        updateNCellExprRcpp(
            out = nCellExpr,
            sparseRaw = raw,
            featureIdx = match(rownames(raw), features) - 1,
            groupVar = groupExpand[object$dataset == dn]
        )
    }
    # Finally, calculate the percentage
    nCellExpr[,1] <- nCellExpr[,1]/sum(groups == levels(groups)[1]) * 100
    nCellExpr[,2] <- nCellExpr[,2]/sum(groups == levels(groups)[2]) * 100
    return(nCellExpr)
}

.callDESeq2 <- function(pseudoBulks, groups,
                         verbose = getOption("ligerVerbose", TRUE)) {
    # DESeq2 workflow

    ## NOTE: DESeq2 wishes that the contrast/control group is the first level
    ## whereas we required it as the second in upstream input. So we need to
    ## reverse it here.
    groups <- stats::relevel(groups, ref = levels(groups)[2])
    groupsDropped <- droplevels(groups)
    if (nlevels(groupsDropped) < 2) {
        cli::cli_abort("No enough replicates for conditions being compared.")
    }
    ## Now levels(groups)[1] becomes control and levels(groups)[2] becomes
    ## the test group
    suppressMessages({
        des <- DESeq2::DESeqDataSetFromMatrix(
            countData = pseudoBulks,
            colData = data.frame(groups = groups),
            design = stats::formula("~groups")
        )
    })
    des <- DESeq2::DESeq(des, test = "Wald", quiet = !verbose)
    res <- DESeq2::results(des, contrast = c("groups", levels(groups)[2],
                                             levels(groups)[1]))
    res <- .DataFrame.as.data.frame(res)
    res$feature <- rownames(res)
    rownames(res) <- NULL
    res$group <- levels(groups)[2]
    res <- res[, c(7, 8, 2, 5, 6)]
    colnames(res) <- c("feature", "group", "logFC", "pval", "padj")

    return(res)
}


####################### Wilcoxon rank-sum test helper ##########################

extractMergedNormData <- function(
        object,
        slot,
        cellIdx = NULL,
        featureIdx = NULL
) {
    cellIdx <- .idxCheck(object, cellIdx, "cell")
    datasetInvolved <- unique(object$dataset[cellIdx])
    cellID <- colnames(object)[cellIdx]
    getter <- switch(
        slot,
        normData = normData,
        normPeak = normPeak
    )
    if (is.null(featureIdx)) {
        # Use intersection by default
        featureIdx <- Reduce(
            intersect,
            lapply(
                getter(object, datasetInvolved),
                rownames
            )
        )
    }
    out <- NULL
    for (i in seq_along(datasetInvolved)) {
        dn <- datasetInvolved[i]
        ld <- dataset(object, dn)
        mat <- getter(ld)
        allFeature <- rownames(mat)
        ldFeatureIdx <- match(featureIdx, allFeature)
        ldFeatureIdx <- ldFeatureIdx[!is.na(ldFeatureIdx)]
        # if (slot == "normData") {
        #     ldFeatureIdx <- .idxCheck(dataset(object, dn), featureIdx, "feature")
        # } else {
        #     allPeaks <- rownames(getter())
        # }

        ldCellIdx <- match(cellID, colnames(dataset(object, dn)))
        ldCellIdx <- ldCellIdx[!is.na(ldCellIdx)]
        out <- cbind(out, mat[ldFeatureIdx, ldCellIdx, drop = FALSE])
        # out <- cbind(out, getter(object, dn)[ldFeatureIdx, ldCellIdx, drop = FALSE])
    }
    out[, cellID]
}

# X: matrix of data to be tested
# y: grouping label of columns of X
# Rcpp source code located in src/wilcoxon.cpp
wilcoxauc <- function(x, clusterVar, verbose = verbose) {
    if (methods::is(x, 'dgTMatrix')) x <- methods::as(x, 'CsparseMatrix') # nocov start
    if (methods::is(x, 'TsparseMatrix')) x <- methods::as(x, 'CsparseMatrix')
    if (is.null(row.names(x))) {
        rownames(x) <- paste0('Feature', seq(nrow(x)))
    } # nocov end
    if (!is.factor(clusterVar)) clusterVar <- factor(clusterVar)
    clusterVar <- droplevels(clusterVar)
    groupSize <- as.numeric(table(clusterVar))

    ## Compute primary statistics
    n1n2 <- groupSize * (ncol(x) - groupSize)
    # rankRes - list(X_ranked, ties), where X_ranked is obs x feature
    xRanked <- Matrix::t(x)
    # This computes the ranking of non-zero values and the ties
    ties <- cpp_rank_matrix_dgc(xRanked@x, xRanked@p,
                                nrow(xRanked), ncol(xRanked), showProgress = verbose)
    # ranksRes <- list(X_ranked = xT, ties = ties)

    # rankRes <- colRanking(x)
    ustat <- computeUstat(xRanked, clusterVar, n1n2, groupSize)
    auc <- t(ustat / n1n2)
    pvals <- computePval(ustat, ties, ncol(x), n1n2)
    fdr <- apply(pvals, 2, function(p) stats::p.adjust(p, 'BH'))

    ### Auxiliary Statistics (AvgExpr, PctIn, LFC, etc)
    groupSums <- colAggregateSum_sparse(x, as.integer(clusterVar) - 1, length(unique(clusterVar)))
    # groupSums <- colAggregateSum(x, clusterVar)
    group_nnz <- colNNZAggr_sparse(x, as.integer(clusterVar) - 1, length(unique(clusterVar)))
    # group_nnz <- colNNZAggr(x, clusterVar)
    group_pct <- t(sweep(group_nnz, 1, as.numeric(table(clusterVar)), "/"))

    group_pct_out <- sweep(-group_nnz, 2, colSums(group_nnz), "+")
    group_pct_out <- sweep(group_pct_out, 1,
                           as.numeric(length(clusterVar) - table(clusterVar)),
                           "/")
    group_pct_out <- t(group_pct_out)

    groupMeans <- t(sweep(groupSums, 1, as.numeric(table(clusterVar)), "/"))

    cs <- colSums(groupSums)
    gs <- as.numeric(table(clusterVar))
    lfc <- Reduce(cbind, lapply(seq_along(levels(clusterVar)), function(g) {
        groupMeans[, g] - (cs - groupSums[g, ])/(length(clusterVar) - gs[g])
    }))

    data.frame(
        feature = rep(row.names(x), times = length(levels(clusterVar))),
        group = factor(rep(levels(clusterVar), each = nrow(x)),
                       levels = levels(clusterVar)),
        avgExpr = as.numeric(groupMeans),
        logFC = as.numeric(lfc),
        statistic = as.numeric(t(ustat)),
        auc = as.numeric(auc),
        pval = as.numeric(pvals),
        padj = as.numeric(fdr),
        pct_in = as.numeric(100 * group_pct),
        pct_out = as.numeric(100 * group_pct_out)
    )
}

computeUstat <- function(Xr, cols, n1n2, groupSize) {
    grs <- rowAggregateSum_sparse(Xr, as.integer(cols) - 1, length(unique(cols)))
    # grs <- rowAggregateSum(Xr, cols)

    # if (inherits(Xr, 'dgCMatrix')) {
    # With the ranking of only non-zero features, here the tie-ranking of
    # zeros need to be added.
    nnz <- rowNNZAggr_sparse(Xr, as.integer(cols) - 1, length(unique(cols)))
    gnz <- groupSize - nnz
    zero.ranks <- (nrow(Xr) - diff(Xr@p) + 1) / 2
    ustat <- t((t(gnz) * zero.ranks)) + grs - groupSize*(groupSize + 1)/2
    # } else {
    #     ustat <- grs - groupSize * (groupSize + 1) / 2
    # }
    return(ustat)
}

computePval <- function(ustat, ties, N, n1n2) {
    z <- ustat - .5 * n1n2
    z <- z - sign(z) * .5
    .x1 <- N ^ 3 - N
    .x2 <- 1 / (12 * (N ^ 2 - N))
    rhs <- unlist(lapply(ties, function(tvals) {
        (.x1 - sum(tvals ^ 3 - tvals)) * .x2
    }))
    usigma <- sqrt(matrix(n1n2, ncol = 1) %*% matrix(rhs, nrow = 1))
    z <- t(z / usigma)
    pvals <- matrix(2 * stats::pnorm(-abs(as.numeric(z))), ncol = ncol(z))
    return(pvals)
}





######################## Visualization #########################################

#' Create heatmap for showing top marker expression in conditions
#' @export
#' @param object A \linkS4class{liger} object, with normalized data and metadata
#' to annotate available.
#' @param result The data.frame returned by \code{\link{runMarkerDEG}}.
#' @param topN Number of top features to be plot for each group. Default
#' \code{5}.
#' @param lfcThresh Hard threshold on logFC value. Default \code{1}.
#' @param padjThresh Hard threshold on adjusted P-value. Default \code{0.05}.
#' @param pctInThresh,pctOutThresh Threshold on expression percentage. These
#' mean that a feature will only pass the filter if it is expressed in more than
#' \code{pctInThresh} percent of cells in the corresponding cluster. Similarly
#' for \code{pctOutThresh}. Only applied when these metrics are available.
#' Default \code{50} percent for both.
#' @param dedupBy When ranking by padj and logFC and a feature is ranked as top
#' for multiple clusters, assign this feature as the marker of a cluster when
#' it has the largest \code{"logFC"} in the cluster or has the lowest
#' \code{"padj"}. Default \code{"logFC"}.
#' @param groupBy Cell metadata variable names for cell grouping. Downsample
#' balancing will also be aware of this. Default \code{"dataset"} and the
#' default cluster.
#' @param groupSize Maximum number of cells in each group to be downsampled for
#' plotting. Default \code{50}.
#' @param column_title Title on the column. Default \code{NULL}.
#' @inheritDotParams plotGeneHeatmap cellAnnotation
#' @inheritDotParams .plotHeatmap transpose showCellLabel showCellLegend showFeatureLabel cellAnnColList featureAnnColList scale trim baseSize cellTextSize featureTextSize cellTitleSize featureTitleSize legendTextSize legendTitleSize viridisOption viridisDirection RColorBrewerOption
#' @return A \link[ComplexHeatmap]{HeatmapList-class} object.
#' @examples
#' defaultCluster(pbmc) <- pbmcPlot$leiden_cluster
#' pbmc <- normalize(pbmc)
#' plotMarkerHeatmap(pbmc, deg.marker)
plotMarkerHeatmap <- function(
        object,
        result,
        topN = 5,
        lfcThresh = 1,
        padjThresh = 0.05,
        pctInThresh = 50,
        pctOutThresh = 50,
        dedupBy = c("logFC", "padj"),
        groupBy = NULL,
        groupSize = 50,
        column_title = NULL,
        ...
) {
    dedupBy <- match.arg(dedupBy)
    if (dedupBy == "logFC") {
        result <- result[order(result[[dedupBy]], decreasing = TRUE), ]
    } else if (dedupBy == "padj") {
        result <- result[order(result[[dedupBy]], decreasing = FALSE), ]
    }
    pctInThresh <- pctInThresh %||% 0
    pctOutThresh <- pctOutThresh %||% 100
    groupBy <- groupBy %||% c("dataset", object@uns$defaultCluster)
    result <- result[!duplicated(result$feature), ]
    result <- result %>%
        dplyr::filter(
            .data$logFC > lfcThresh,
            .data$padj < padjThresh
        )
    if ("pct_in" %in% colnames(result) &&
        "pct_out" %in% colnames(result)) {
        result <- result %>%
            dplyr::filter(
                .data$pct_in > pctInThresh,
                .data$pct_out < pctOutThresh
            )
    }
    result <- result %>%
        dplyr::group_by(.data[["group"]]) %>%
        dplyr::arrange(
            .data[["padj"]],
            -.data[["logFC"]],
            .by_group = TRUE
        ) %>%
        dplyr::slice_head(n = topN) %>%
        as.data.frame()
    cellIdx <- downsample(object, maxCells = groupSize, balance = groupBy,
                          returnIndex = TRUE)
    featureAnn <- result[, "group", drop = FALSE]

    rownames(featureAnn) <- result$feature
    colnames(featureAnn) <- "marker"
    plotGeneHeatmap(object, features = result$feature,
                    cellIdx = cellIdx,
                    useCellMeta = groupBy,
                    featureAnnotation = featureAnn,
                    cellSplitBy = rev(groupBy),
                    featureSplitBy = "marker",
                    showFeatureLegend = FALSE,
                    cluster_columns = FALSE,
                    cluster_column_slices = FALSE,
                    cluster_rows = FALSE,
                    cluster_row_slices = FALSE,
                    column_title = column_title,
                    ...)
}


#' Create heatmap for pairwise DEG analysis result
#' @export
#' @param object A \linkS4class{liger} object, with normalized data and metadata
#' to annotate available.
#' @param result The data.frame returned by \code{\link{runPairwiseDEG}}.
#' @param group The test group name among the result to be shown. Must specify
#' only one if multiple tests are available (i.e. split test). Default
#' \code{NULL} works with single-test result and raises error with split-test
#' result.
#' @param topN Maximum number of top significant features to be plot for up- and
#' down-regulated genes. Default \code{20}.
#' @param absLFCThresh Hard threshold on absolute logFC value. Default \code{1}.
#' @param padjThresh Hard threshold on adjusted P-value. Default \code{0.05}.
#' @param pctInThresh,pctOutThresh Threshold on expression percentage. These
#' mean that a feature will only pass the filter if it is expressed in more than
#' \code{pctInThresh} percent of cells in the corresponding cluster. Similarly
#' for \code{pctOutThresh}. Only applied when these metrics are available.
#' Default \code{50} percent for both.
#' @param downsampleSize Maximum number of downsampled cells to be shown in the
#' heatmap. The downsampling is balanced on the cells involved in the test
#' specified. Default \code{200}.
#' @param useCellMeta Cell metadata variable names for cell grouping. Default
#' \code{NULL} includes dataset source and the default cluster.
#' @param column_title Title on the column. Default \code{NULL}.
#' @param seed Random seed for reproducibility. Default \code{1}.
#' @inheritDotParams .plotHeatmap transpose showCellLabel showCellLegend showFeatureLabel cellAnnColList featureAnnColList scale trim baseSize cellTextSize featureTextSize cellTitleSize featureTitleSize legendTextSize legendTitleSize viridisOption viridisDirection RColorBrewerOption
#' @return A \link[ComplexHeatmap]{HeatmapList-class} object.
#' @examples
#' defaultCluster(pbmc) <- pbmcPlot$leiden_cluster
#' pbmc$condition_cluster <- interaction(pbmc$dataset, pbmc$defaultCluster)
#' deg <- runPairwiseDEG(pbmc, 'stim.0', 'stim.1', 'condition_cluster')
#' pbmc <- normalize(pbmc)
#' plotPairwiseDEGHeatmap(pbmc, deg, 'stim.0')
plotPairwiseDEGHeatmap <- function(
        object,
        result,
        group = NULL,
        topN = 20,
        absLFCThresh = 1,
        padjThresh = 0.05,
        pctInThresh = 50,
        pctOutThresh = 50,
        downsampleSize = 200,
        useCellMeta = NULL,
        column_title = NULL,
        seed = 1,
        ...
) {
    resultMeta <- attr(result, 'groups')
    if (is.null(resultMeta)) {
        cli::cli_abort(
            c(x = "{.var result} object is corrupted, no meta-information available for selecting cells.",
              i = "Please re-run {.func runPairwiseDEG} with the latest {.pkg rliger} package (>2.0.1).")
        )
    }
    if (is.null(group)) {
        if (length(resultMeta) == 1) {
            group <- names(resultMeta)
        } else {
            cli::cli_abort(
                c("Please specify {.field group} when multiple tests are available",
                  i = "Available one{?s} {?is/are}: {.val {names(resultMeta)}}")
            )
        }
    }
    if (length(group) != 1) cli::cli_abort("Only 1 group allowed at a time.")
    if (!group %in% names(resultMeta)) {
        cli::cli_abort(
            c(x = "Group {.val {group}} not found in the result object.",
              i = "Available one{?s} {?is/are}: {.val {names(resultMeta)}}")
        )
    }
    cellIdx <- resultMeta[[group]]
    groupVar <- factor(rep(names(cellIdx), lengths(cellIdx)), levels = names(cellIdx))
    cellIdx <- Reduce(c, cellIdx)
    pctInThresh <- pctInThresh %||% 0
    pctOutThresh <- pctOutThresh %||% 100
    useCellMeta <- useCellMeta %||% c("dataset", object@uns$defaultCluster)
    # Idk why `filter(.data[['group']] == group)` won't work there lol
    result <- result[result$group == group,]
    result <- result %>%
        dplyr::filter(!is.na(.data[['padj']])) %>%
        dplyr::filter(
            abs(.data[['logFC']]) >= absLFCThresh,
            .data[['padj']] <= padjThresh
        ) %>%
        dplyr::mutate(regulation = factor(
            ifelse(.data[['logFC']] > 0, "up", "down"),
            levels = c("up", "down")
        ))
    if ("pct_in" %in% colnames(result) &&
        "pct_out" %in% colnames(result)) {
        result <- result %>%
            dplyr::filter(
                dplyr::case_when(
                    .data[['logFC']] > 0 ~ .data[['pct_in']] > pctInThresh & .data[['pct_out']] < pctOutThresh,
                    .data[['logFC']] < 0 ~ .data[['pct_out']] > pctInThresh & .data[['pct_in']] < pctOutThresh,
                )
            )
    }
    result <- result %>%
        dplyr::group_by(.data[['regulation']]) %>%
        dplyr::arrange(
            .data[['padj']],
            -.data[['logFC']],
            .by_group = TRUE
        ) %>%
        dplyr::slice_head(n = topN) %>%
        as.data.frame()

    set.seed(seed)
    downsampleSize <- min(length(cellIdx), downsampleSize)
    downsampleIdx <- seq_along(cellIdx) %in% sample(length(cellIdx), downsampleSize)
    cellIdxSub <- cellIdx[downsampleIdx]
    groupVar <- groupVar[downsampleIdx]

    cellAnn <- data.frame(
        group = groupVar,
        row.names = colnames(object)[cellIdxSub]
    )
    featureAnn <- data.frame(
        regulation = result$regulation,
        row.names = result$feature
    )
    plotGeneHeatmap(object, features = result$feature,
                    cellIdx = cellIdxSub,
                    useCellMeta = useCellMeta,
                    cellAnnotation = cellAnn,
                    featureAnnotation = featureAnn,
                    cellSplitBy = "group",
                    featureSplitBy = "regulation",
                    showFeatureLegend = FALSE,
                    cluster_columns = FALSE,
                    cluster_column_slices = FALSE,
                    cluster_rows = FALSE,
                    cluster_row_slices = FALSE,
                    column_title = column_title,
                    ...)
}
Any scripts or data that you put into this service are public.
rliger documentation built on June 8, 2025, 1:56 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
rliger
Linked Inference of Genomic Experimental Relationships

R/DEG_marker.R
In rliger: Linked Inference of Genomic Experimental Relationships

Defines functions plotPairwiseDEGHeatmap plotMarkerHeatmap computePval computeUstat wilcoxauc extractMergedNormData .callDESeq2 calcPctInOut makePseudoBulk setupRepVar .DE.checkDataAvail .runDEG runWilcoxon runMarkerDEG runPairwiseDEG

Documented in plotMarkerHeatmap plotPairwiseDEGHeatmap runMarkerDEG runPairwiseDEG runWilcoxon

Try the rliger package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

rliger Linked Inference of Genomic Experimental Relationships

R/DEG_marker.R In rliger: Linked Inference of Genomic Experimental Relationships

Defines functions plotPairwiseDEGHeatmap plotMarkerHeatmap computePval computeUstat wilcoxauc extractMergedNormData .callDESeq2 calcPctInOut makePseudoBulk setupRepVar .DE.checkDataAvail .runDEG runWilcoxon runMarkerDEG runPairwiseDEG

Documented in plotMarkerHeatmap plotPairwiseDEGHeatmap runMarkerDEG runPairwiseDEG runWilcoxon

Try the rliger package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

rliger
Linked Inference of Genomic Experimental Relationships

R/DEG_marker.R
In rliger: Linked Inference of Genomic Experimental Relationships