R/deseq_2x3_polar.R

Defines functions polar_p2x3 getCols deseq_2x3_polar

Documented in deseq_2x3_polar

#' Convert 2x3 design DESeq2 objects to a volcano3d object
#'
#' This function is used instead of \code{\link{polar_coords}} if you have raw
#' RNA-Seq count data. It takes results from a 2x3 design DESeq2 analysis using
#' [deseq_2x3()]. Alternatively it will accept a list of 3 `DESeqDataSet` or
#' `DESeqResults` objects each with the same type of binary comparison across 3
#' groups. Statistical results are extracted and converted to a 'volc3d' object,
#' which can be directly plotted. Example usage would include comparing gene
#' expression against a binary outcome e.g. response vs non-response, across 3
#' drugs.
#'
#' @param object A named list of 3 objects of class 'DESeqDataSet', or a list of
#'   3 DESeq2 results dataframes generated by calling `DESeq2::results()`. Each
#'   object should contain the binary outcome comparison for each of the 3
#'   groups, e.g. group A response vs non-response, group B response vs
#'   non-response, group C etc. The names of the groups are taken from the list
#'   names.
#' @param pcutoff Cut-off for p-value significance
#' @param fc_cutoff Cut-off for fold change on radial axis
#' @param padj.method Can be `"qvalue"` or any method available in `p.adjust`.
#'   The option `"none"` is a pass-through.
#' @param process Character value specifying colour process for statistical
#'   significant genes: "positive" specifies genes are coloured if fold change
#'   is >0; "negative" for genes with fold change <0 (note that for clarity the
#'   polar position is altered so that genes along each axis have the most
#'   strongly negative fold change values); or "two.sided" which is a compromise
#'   in which positive genes are labelled as before but genes with negative fold
#'   changes and significant p-values have an inverted colour scheme.
#' @param scheme Vector of colours starting with non-significant genes/variables
#' @param labs Optional character vector for labelling groups. Default `NULL`
#'   leads to abbreviated labels based on levels in `outcome` using
#'   [abbreviate()]. A vector of length 3 with custom abbreviated names for the
#'   outcome levels can be supplied. Otherwise a vector length 8 is expected, of
#'   the form "ns", "A+", "A+B+", "B+", "B+C+", "C+", "A+C+", "A+B+C+", where
#'   "ns" means non-significant and A, B, C refer to levels 1, 2, 3 in
#'   `outcome`, and must be in the correct order.
#' @details
#' This function generates a 'volc3d' class object for visualising a 2x3 way
#' analysis for RNA-Seq data. For usual workflow it is typically preceded by a
#' call to [deseq_2x3()] which runs the 3x DESeq2 analyses required.
#' 
#' Scaled polar coordinates are based on the DESeq2 statistic for each group
#' comparison. Unscaled polar coordinates are generated using the log2 fold
#' change for each group comparison.
#' 
#' The z axis for 3d volcano plots does not have as clear a corollary in 2x3
#' analysis as for the standard 3-way analysis (which uses the likelihood ratio
#' test for the 3 groups). For 2x3 polar analysis the smallest p-value from the
#' 3 group pairwise comparisons for each gene is used to generate a z coordinate
#' as -log10(p-value).
#' 
#' The colour scheme is not as straightforward as for the standard polar plot
#' and volcano3D plot since genes (or attributes) can be significantly up or
#' downregulated in the response comparison for each of the 3 groups.
#' `process = "positive"` means that genes are labelled with colours if a gene
#' is significantly upregulated in the response for that group. This uses the
#' primary colours (RGB) so that if a gene is upregulated in both red and blue
#' groups it becomes purple etc with secondary colours. If the gene is
#' upregulated in all 3 groups it is labelled black. Non-significant genes are
#' in grey.
#' 
#' With `process = "negative"` genes are coloured when they are significantly
#' downregulated. With `process = "two.sided"` the colour scheme means that both
#' significantly up- and down-regulated genes are coloured with downregulated
#' genes labelled with inverted colours (i.e. cyan is the inverse of red etc).
#' However, significant upregulation in a group takes precedence.
#' 
#' @return Returns an S4 'volc3d' object containing:
#' \itemize{
#'   \item{'df'} A list of 2 dataframes. Each dataframe contains both x,y,z
#'   coordinates as well as polar coordinates r, angle. The first dataframe has
#'   coordinates based on the DESeq2 statistic. The 2nd dataframe is unscaled
#'   and represents log2 fold change for gene expression. The `type` argument in
#'   \code{\link{volcano3D}}, \code{\link{radial_plotly}} and
#'   \code{\link{radial_ggplot}} corresponds to these dataframes.
#'   \item{'outcome'} An empty factor whose levels are the three-group contrast
#'   factor for comparisons
#'   \item{'data'} Empty dataframe for compatibility
#'   \item{'pvals'} A dataframe containing p-values. Columns 1-3 are pairwise
#'   comparisons between the outcome factor for groups A, B, C respectively.
#'   \item{'padj'} A dataframe containing p-values adjusted for multiple testing
#'   \item{'pcutoff} Numeric value for cut-off for p-value significance
#'   \item{'scheme'} Character vector with colour scheme for plotting
#'   \item{'labs'} Character vector with labels for colour groups
#' }
#' @seealso [deseq_2x3()]
#' @importFrom Rfast rowMins
#' @export

deseq_2x3_polar <- function(object,
                            pcutoff = 0.05,
                            fc_cutoff = NULL,
                            padj.method = "BH",
                            process = c("positive", "negative", "two.sided"),
                            scheme = c('grey60', 'red', 'gold2', 'green3', 
                                        'cyan', 'blue', 'purple', 'black'),
                            labs = NULL) {
  if (length(object) != 3) stop("object must be a list of 3 DESeq2 results")
  if (!requireNamespace("DESeq2", quietly = TRUE)) {
    stop("Can't find package DESeq2. Try:
           BiocManager::install('DESeq2')", call. = FALSE)
  }
  if (is(object[[1]], "DESeqDataSet")) {
    if (!all.equal(object[[1]]@design, 
                   object[[3]]@design, 
                   object[[3]]@design)){
      message("Design formulae differ")}
  }
  process <- match.arg(process)
  res <- lapply(object, function(i) {
    if (is(i, "DESeqDataSet")) {
      as.data.frame(DESeq2::results(i))
    } else as.data.frame(i)
  })
  rn <- Reduce(intersect, lapply(res, rownames))
  df1 <- getCols(res, rn, 'stat')
  df2 <- getCols(res, rn, 'log2FoldChange')
  if (process == "negative") {
    df1 <- -df1
    df2 <- -df2
  }
  SE <- getCols(res, rn, 'lfcSE')
  colnames(SE) <- paste0("SE_", colnames(SE))
  df1 <- cbind(df1, SE)
  df2 <- cbind(df2, SE)
  df1 <- polar_xy(df1)
  df2 <- polar_xy(df2)
  pvals <- getCols(res, rn, 'pvalue')
  
  if (padj.method != 'BH') {
    res[[1]]$padj <- deseq_qvalue(res[[1]], padj.method)$qvalue
    res[[2]]$padj <- deseq_qvalue(res[[2]], padj.method)$qvalue
    res[[3]]$padj <- deseq_qvalue(res[[3]], padj.method)$qvalue
  }
  padj <- getCols(res, rn, 'padj')
  outcome <- factor(levels = names(res))
  
  ptab <- polar_p2x3(df2, pvals, padj, pcutoff, fc_cutoff, process, scheme,
                     labs)
  df1 <- cbind(df1, ptab)
  df2 <- cbind(df2, ptab)
  
  methods::new("volc3d",
               df = list(scaled = df1, 
                         unscaled = df2, 
                         type = "deseq_2x3_polar"),
               outcome = outcome,
               data = data.frame(), pvals = pvals, padj = padj,
               pcutoff = pcutoff, scheme = scheme,
               labs = levels(ptab$lab))
}


getCols <- function(res, rn, col) {
  out <- cbind(res[[1]][rn, col],
        res[[2]][rn, col],
        res[[3]][rn, col])
  rownames(out) <- rn
  colnames(out) <- names(res)
  out
}


polar_p2x3 <- function(df, pvals, padj = pvals,
                       pcutoff = 0.05,
                       fc_cutoff = NULL,
                       process = "positive",
                       scheme = c('grey60', 'red', 'gold2', 'green3', 
                                  'cyan', 'blue', 'purple', 'black'),
                       labs = NULL) {
  outcome_levels <- colnames(df)[1:3]
  pvalue <- Rfast::rowMins(pvals, value=TRUE)
  z <- -log10(pvalue)
  pcheck <- sapply(1:3, function(i) padj[,i] < pcutoff & df[,i] > 0)
  colnames(pcheck) <- outcome_levels
  pcheck <- pcheck *1  # convert to numeric
  pcheck[is.na(pcheck)] <- 0
  # negatives
  if (process == "two.sided") {
    pneg <- sapply(1:3, function(i) padj[,i] < pcutoff & df[,i] < 0)
    colnames(pneg) <- outcome_levels
    negrs <- rowSums(pneg, na.rm=T)
    pneg[negrs>0] <- 1 - pneg[negrs>0]  # invert
    pneg[is.na(pneg)] <- 0
    posrs <- rowSums(pcheck, na.rm=T)
    # replace only grey points with pneg
    pcheck[posrs == 0, ] <- pneg[posrs == 0, ]
  }
  pset <- paste0(pcheck[,1], pcheck[,2], pcheck[,3])
  if (!is.null(fc_cutoff)) {
    pset[df[, 'r'] < fc_cutoff] <- '000'
  }
  if (is.null(labs) | length(labs) == 3) {
    abbrev <- if (length(labs) == 3) labs else abbreviate(outcome_levels, 1)
    labs <- c("ns",
              paste0(abbrev[1], "+"),
              paste0(abbrev[1], "+", abbrev[2], "+"),
              paste0(abbrev[2], "+"),
              paste0(abbrev[2], "+", abbrev[3], "+"),
              paste0(abbrev[3], "+"),
              paste0(abbrev[1], "+", abbrev[3], "+"),
              paste0(abbrev[1], "+", abbrev[2], "+", abbrev[3], "+"))
  }
  lab <- factor(pset, levels = c('000', '100', '110', '010',
                                 '011', '001', '101', '111'),
                labels = labs)
  col <- scheme[lab]
  data.frame(z, pvalue, col, lab)
}

Try the volcano3D package in your browser

Any scripts or data that you put into this service are public.

volcano3D documentation built on May 31, 2023, 5:31 p.m.