R/polar_coords_2x3.R
In volcano3D: 3D Volcano Plots and Polar Plots for Three-Class Data

Documented in calc_stats_2x3 polar_coords_2x3

#' Coordinates for three way polar plot from 2x3 factor analysis
#'
#' This function creates a 'volc3d' object of S4 class for downstream plots
#' containing the p-values from a 2x3 factor analysis, expression data
#' sample data and polar coordinates. For RNA-Seq count data, two functions
#' \code{\link{deseq_2x3}} followed by [deseq_2x3_polar()] can be used instead.
#'
#' @param data Dataframe or matrix with variables in columns and samples in rows
#' @param metadata Dataframe of sample information with samples in rows
#' @param outcome Either the name of column in `metadata` containing the binary
#'   outcome data. Or a vector with 2 groups, ideally a factor. If it is not a
#'   factor, this will be coerced to a factor. This must have exactly 2 levels.
#' @param group Either the name of column in `metadata` containing the 3-way
#'   grouping data. Or a vector with 3 groups, ideally a factor. If it is not a
#'   factor, this will be coerced to a factor. This must have exactly 3 levels.
#'   NOTE: if `pvals` is given, the order of the levels in `group` must
#'   correspond to the order of columns in `pvals`.
#' @param pvals Optional matrix or dataframe with p-values in 3 columns. If
#'   `pvals` is not given, it is calculated using the function
#'   \code{\link{calc_stats_2x3}}. The p-values in 3 columns represent the
#'   comparison between the binary outcome with each column for the 3 groups as
#'   specified in `group`.
#' @param padj Matrix or dataframe with adjusted p-values. If not supplied,
#'   defaults to use nominal p-values from `pvals`.
#' @param pcutoff Cut-off for p-value significance
#' @param fc_cutoff Cut-off for fold change on radial axis
#' @param padj.method Can be `"qvalue"` or any method available in `p.adjust`.
#'   The option `"none"` is a pass-through.
#' @param process Character value specifying colour process for statistical
#'   significant genes: "positive" specifies genes are coloured if fold change
#'   is >0; "negative" for genes with fold change <0 (note that for clarity the
#'   polar position is altered so that genes along each axis have the most
#'   strongly negative fold change values); or "two.sided" which is a compromise
#'   in which positive genes are labelled as before but genes with negative fold
#'   changes and significant p-values have an inverted colour scheme.
#' @param scheme Vector of colours starting with non-significant variables
#' @param labs Optional character vector for labelling groups. Default `NULL`
#'   leads to abbreviated labels based on levels in `outcome` using
#'   [abbreviate()]. A vector of length 3 with custom abbreviated names for the
#'   outcome levels can be supplied. Otherwise a vector length 8 is expected, of
#'   the form "ns", "A+", "A+B+", "B+", "B+C+", "C+", "A+C+", "A+B+C+", where
#'   "ns" means non-significant and A, B, C refer to levels 1, 2, 3 in
#'   `outcome`, and must be in the correct order.
#' @param ... Optional arguments passed to \code{\link{calc_stats_2x3}}
#' @details
#' This function is designed for manually generating a 'volc3d' class object for
#' visualising a 2x3 way analysis comparing a large number of attributes such as
#' genes. For RNA-Seq data we suggest using [deseq_2x3()] and
#' [deseq_2x3_polar()] functions in sequence instead.
#' 
#' Scaled polar coordinates are generated using the t-score for each group
#' comparison. Unscaled polar coordinates are generated as difference between
#' means for each group comparison. If p-values are not supplied they are
#' calculated by [calc_stats_2x3()] using either t-tests or wilcoxon tests.
#' 
#' The z axis for 3d volcano plots does not have as clear a corollary in 2x3
#' analysis as for the standard 3-way analysis (which uses the likelihood ratio
#' test for the 3 groups). For 2x3 polar analysis the smallest p-value from the
#' 3 group pairwise comparisons for each gene is used to generate a z coordinate
#' as -log10(p-value).
#' 
#' The colour scheme is not as straightforward as for the standard polar plot
#' and volcano3D plot since genes (or attributes) can be significantly up or
#' downregulated in the response comparison for each of the 3 groups.
#' `process = "positive"` means that genes are labelled with colours if a gene
#' is significantly upregulated in the response for that group. This uses the
#' primary colours (RGB) so that if a gene is upregulated in both red and blue
#' group it becomes purple etc with secondary colours. If the gene is
#' upregulated in all 3 groups it is labelled black. Non-significant genes are
#' in grey.
#' 
#' With `process = "negative"` genes are coloured when they are significantly
#' downregulated. With `process = "two.sided"` the colour scheme means that both
#' significantly up- and down-regulated genes are coloured with downregulated
#' genes labelled with inverted colours (i.e. cyan is the inverse of red etc).
#' However, significant upregulation in a group takes precedence.
#' 
#' @return Returns an S4 'volc3d' object containing:
#' \itemize{
#'   \item{'df'} A list of 2 dataframes. Each dataframe contains both x,y,z
#'   coordinates as well as polar coordinates r, angle. The first dataframe has
#'   coordinates on scaled data. The 2nd dataframe has unscaled data (e.g. log2
#'   fold change for gene expression). The `type` argument in
#'   \code{\link{volcano3D}}, \code{\link{radial_plotly}} and
#'   \code{\link{radial_ggplot}} corresponds to these dataframes.
#'   \item{'outcome'} The three-group contrast factor used for comparisons,
#'   linked to the `group` column
#'   \item{'data'} Dataframe or matrix containing the expression data
#'   \item{'pvals'} A dataframe containing p-values in 3 columns representing
#'   the binary comparison for the outcome for each of the 3 groups.
#'   \item{'padj'} A dataframe containing p-values adjusted for multiple testing
#'   \item{'pcutoff} Numeric value for cut-off for p-value significance
#'   \item{'scheme'} Character vector with colour scheme for plotting
#'   \item{'labs'} Character vector with labels for colour groups
#' }
#' 
#' @seealso \code{\link{deseq_2x3}}, \code{\link{deseq_2x3_polar}},
#'   \code{\link{calc_stats_2x3}}
#' @importFrom stats complete.cases
#' @export

polar_coords_2x3 <- function(data,
                             metadata = NULL,
                             outcome,
                             group,
                             pvals = NULL, 
                             padj = pvals,
                             pcutoff = 0.05,
                             fc_cutoff = NULL,
                             padj.method = "BH",
                             process = c("positive", "negative", "two.sided"),
                             scheme = c('grey60', 'red', 'gold2', 'green3', 
                                        'cyan', 'blue', 'purple', 'black'),
                             labs = NULL,
                             ...) {
  process <- match.arg(process)
  if (!is.null(metadata)) {
    if (length(outcome) != 1 | !outcome %in% colnames(metadata)) {
      stop(outcome, " is not a column in metadata")
    }
    outcome <- factor(metadata[, outcome])
    if (length(group) != 1 | !group %in% colnames(metadata)) {
      stop(group, " is not a column in metadata")
    }
    group <- factor(metadata[, group])
  }
  if (!all.equal(nrow(data), length(outcome), length(group))) {
    stop("Mismatch between data and metadata")
  }
  if (nlevels(group) != 3) stop("Number of levels in group is not 3")
  if (nlevels(outcome) != 2) stop("Number of levels in outcome is not 2")
  if (any(is.na(outcome), is.na(group))) {
    ok <- complete.cases(outcome, group)
    data <- data[ok,]
    outcome <- outcome[ok]
    group <- group[ok]
    message("Removing NA from outcome/group")
  }
  
  res <- calc_stats_2x3(data, outcome, group, padj.method, ...)
  rn <- Reduce(intersect, lapply(res, rownames))
  df1 <- getCols(res, rn, 'statistic')
  df2 <- getCols(res, rn, 'mean.diff')
  if (is.null(pvals)) {
    pvals <- getCols(res, rn, 'pvalue')
    padj <- getCols(res, rn, 'padj')
  }
  SE <- getCols(res, rn, 'stderr')
  colnames(SE) <- paste0("SE_", colnames(SE))
  if (process == "negative") {
    df1 <- -df1
    df2 <- -df2
  }
  df1 <- cbind(df1, SE)
  df2 <- cbind(df2, SE)
  df1 <- polar_xy(df1)
  df2 <- polar_xy(df2)
  outcome <- factor(levels = levels(group))
  
  ptab <- polar_p2x3(df2, pvals, padj, pcutoff, fc_cutoff, process, scheme,
                     labs)
  df1 <- cbind(df1, ptab)
  df2 <- cbind(df2, ptab)
  
  methods::new("volc3d",
               df = list(scaled=df1, unscaled=df2, type="polar_coords_2x3"),
               outcome = outcome,
               data = data, pvals = pvals, padj = padj,
               pcutoff = pcutoff, scheme = scheme,
               labs = levels(ptab$lab))
}


#' Calculate p-values for 2x3-way analysis
#' 
#' @param data Dataframe or matrix with variables in columns and samples in rows
#' @param outcome Factor with 2 levels
#' @param group Factor with 3 levels
#' @param padj.method Can be `"qvalue"` or any method available in `p.adjust`.
#'   The option `"none"` is a pass-through.
#' @param test Character value specifying the statistical test between the 2
#'   level response outcome. Current options are "t.test" or "wilcoxon".
#' @param exact Logical for whether to use an exact test (Wilcoxon test only)
#' @importFrom matrixTests col_t_welch col_wilcoxon_twosample
#' @return A list containing a data frame with summary statistics for the 
#' comparisons between the outcome, for each group level. 
#' @export
#' 
calc_stats_2x3 <- function(data, outcome, group, padj.method,
                           test = c("t.test", "wilcoxon"),
                           exact = FALSE) {
  test <- match.arg(test)
  res <- lapply(levels(group), function(i) {
    subdat <- data[group == i, ]
    suboutcome <- as.numeric(outcome[group == i])
    out <- col_t_welch(subdat[suboutcome==1, ], subdat[suboutcome==2, ])
    if (test == "wilcoxon") {
      out$pvalue <- col_wilcoxon_twosample(subdat[suboutcome==1, ],
                                           subdat[suboutcome==2, ],
                                           exact = exact)$pvalue
    }
    out$padj <- qval(out$pvalue, padj.method)
    out
  })
  names(res) <- levels(group)
  res
}