R/plot_mv.R

Defines functions plot_mv.default plot_mv.DESeqTransform plot_mv.DESeqDataSet plot_mv.DGELM plot_mv.DGEList plot_mv.MArrayLM plot_mv

Documented in plot_mv plot_mv.default plot_mv.DESeqDataSet plot_mv.DESeqTransform plot_mv.DGEList plot_mv.DGELM plot_mv.MArrayLM

#' Mean-Variance Plot
#'
#' This function visualizes the mean-variance relationship of omic data before
#' or after modeling.
#'
#' @param dat Omic data matrix or matrix-like object with rows corresponding to
#'   probes and columns to samples. Data are presumed to be filtered and
#'   normalized prior to visualization. Alternatively, any object from the
#'   \code{limma}, \code{edgeR}, or \code{DESeq2} pipelines. See Details.
#' @param design Optional design matrix with rows corresponding to samples and
#'   columns to coefficients to be estimated. Only relevant for \code{
#'   \link[edgeR]{DGEList}} objects. See Details.
#' @param resid Should y-axis plot genewise variance or residual variance? Only
#'   relevant for \code{DESeqDataSet}s that have been fit with a negative
#'   binomial model. See Details.
#' @param trans Data transformation to be applied to probewise means (if \code{
#'   trans = "rank"}) or standard deviations (if \code{trans = "log"} or \code{
#'   "sqrt"}). Not all transformations are appropriate for all data types. See
#'   Details.
#' @param span Width of the LOWESS smoothing window as a proportion.
#' @param size Point size. 
#' @param alpha Point transparency.
#' @param title Optional plot title.
#' @param xlab Optional label for x-axis.
#' @param ylab Optional label for y-axis.
#' @param legend Legend position. Must be one of \code{"bottom"}, \code{"left"},
#'   \code{"top"}, \code{"right"}, \code{"bottomright"}, \code{"bottomleft"},
#'   \code{"topleft"}, or \code{"topright"}.
#' @param hover Show probe name by hovering mouse over data point? If \code{
#'   TRUE}, the plot is rendered in HTML and will either open in your browser's
#'   graphic display or appear in the RStudio viewer. Probe names are extracted
#'   from \code{dat}.
#'
#' @details
#' Mean-variance (MV) plots are a quick and easy way to visualize the
#' relationship between the first two moments of probewise data distributions.
#' When used prior to modeling, they may help better understand the internal
#' structure of the data and inspect for potential outliers. The effects of
#' filtering and transformations can also be readily evaluated. When applied
#' after modeling, MV plots help assess the assumptions of the regression.
#'
#' If \code{dat} is a matrix or object inheriting from the \code{
#' \link[Biobase]{ExpressionSet}} class, then it is presumed to be filtered and
#' normalized prior to plotting. For count data, this means undergoing some sort
#' of variance stabilizing transformation, such as \code{\link[edgeR]{cpm}}
#' (with \code{log = TRUE}), \code{\link[DESeq2]{vst}}, \code{\link[DESeq2]{
#' rlog}}, etc. If \code{dat} is of class \code{\link[limma]{MArrayLM}} or
#' \code{\link[edgeR]{DGEGLM}}, then the y-axis represents residual variance.
#' For \code{\link[DESeq2]{DESeqDataSet}}s, the choice of whether to plot
#' variance or residual variance is determined by the \code{resid} argument. For
#' data that has been fit with a negative binomial model, a residual matrix is
#' generated by subtracting the normalized signal matrix from the normalized
#' counts on the log2-CPM scale.
#'
#' Default setting for the \code{trans} argument vary by S3 class. When \code{
#' trans = 'rank'}, rank-transformed means are plotted against standard
#' deviations for each probe. This is a reasonable choice for most data types.
#' It is also common in the microarray literature to plot probewise means
#' against log-transformed standard deviations, while the \code{limma} authors
#' recommend the square-root transform for count data on the log2-CPM scale.
#'
#' Note that if passing a matrix of log2-CPM transformed counts, the plot will
#' not be exactly the same as the output of \code{\link[limma]{voom}} when
#' \code{plot = TRUE}. This is because the \code{voom} transform involves an
#' initial modeling step in which probewise means are plotted against
#' square-root transformed residual standard deviations on the log2-CPM scale.
#' See \code{\link{plot_voom}} to recreate the voom transformation plot in
#' \code{bioplotr} style.
#'
#' \code{plot_mv} fits a LOWESS curve to the points with a smoothing window
#' given by \code{span}. If \code{dat} is an \code{MArrayLM} object with
#' standard errors moderated by \code{eBayes}, then prior variance will also be
#' plotted as either a horizontal line or a smooth curve, depending on whether a
#' global or intensity-dependent prior was used. If robust empirical Bayes was
#' used to create \code{dat}, outlier variances are highlighted. See \code{
#' \link[limma]{squeezeVar}} for more details.
#'
#' @references
#' Huber, W., Hedebreck, A., Sültmann, H., Poustka, A. & Vingron, M. (2002).
#' \href{https://www.ncbi.nlm.nih.gov/pubmed/12169536}{Variance Stabilization
#' Applied to Microarray Data Calibration and to the Quantification of
#' Differential Expression}. \emph{Bioinformatics}, \emph{18}(1), S96-2104.
#'
#' Sartor, M.A., Tomlinson, C.R., Wesselkamper, S.C., Sivaganesan, S., Leikauf,
#' G.D. & Medvedovic, M. (2006).
#' \href{http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-538}{
#' Intensity-based hierarchical Bayes method improves testing for differentially
#' expressed genes in microarray experiments}. \emph{BMC Bioinformatics},
#' \strong{7}: 538.
#'
#' Law, C.W., Chen, Y., Shi, W., & Smyth, G.K. (2014).
#' \href{https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-2-r29}{
#' voom: precision weights unlock linear model analysis tools for RNA-seq read
#' counts}. \emph{Genome Biology}, \strong{15}: R29.
#'
#' @examples
#' mat <- matrix(rnorm(1000 * 10), nrow = 1000, ncol = 10)
#' plot_mv(mat, trans = "log")
#'
#' library(limma)
#' grp <- rep(c("ctl", "trt"), each = 5)
#' des <- model.matrix(~ grp)
#' fit <- eBayes(lmFit(mat, des), trend = TRUE)
#' plot_mv(fit)
#'
#' @seealso
#' \code{\link[edgeR]{plotMeanVar}}, \code{\link[edgeR]{dglmStdResid}},
#' \code{\link[vsn]{meanSdPlot}}, \code{\link[limma]{plotSA}},
#' \code{\link[limma]{voom}}
#'
#' @export
#' @importFrom matrixStats rowSds
#' @importFrom ggsci pal_d3
#' @import dplyr
#' @import ggplot2
#'

plot_mv <- function(
  dat,
   trans = 'rank',
   title = 'Mean-Variance Plot',
  legend = 'right', ...
) {

  # Preliminaries
  if (nrow(dat) < 2L) {
    stop('plot_mv requires at least 2 probes to fit a mean-variance trend.')
  }
  if (!trans %in% c('rank', 'log', 'sqrt')) {
    stop('trans must be one of "rank", "log", or "sqrt".')
  }
  locations <- c('bottom', 'left', 'top', 'right',
                 'bottomright', 'bottomleft', 'topleft', 'topright')
  legend <- match.arg(legend, locations)

  # Method
  UseMethod('plot_mv')

}


#' @rdname plot_mv
#' @export

plot_mv.MArrayLM <- function(
  dat,
   trans = 'log',
    span = 0.5,
    size = NULL, 
   alpha = NULL,
   title = 'Mean-Variance Plot',
    xlab = NULL,
    ylab = NULL,
  legend = 'right',
   hover = FALSE
) {

  # Preliminaries
  if (dat %>% is('MArrayLM') && dat$t %>% is.null && dat$F %>% is.null) {
    warning('Standard errors for dat have not been moderated. Consider ',
            're-running plot_mv after shrinking residual variance with ',
            'eBayes. See ?eBayes and ?squeezeVar for more info.')
  }

  # Tidy data
  mu <- dat$Amean                                # Extract mu, sigma, prior
  sigma <- dat$sigma
  prior <- sqrt(dat$s2.prior)
  s2 <- dat$sigma^2L / dat$s2.prior              # Check for outliers
  pdn <- pf(s2, df1 = dat$df.residual, df2 = max(dat$df.prior))
  pup <- pf(s2, df1 = dat$df.residual, df2 = max(dat$df.prior), lower.tail = FALSE)
  q <- p.adjust(2L * pmin(pdn, pup), method = 'BH')
  outliers <- q <= 0.05
  if (trans == 'rank') {                         # Apply transformations
    mu <- rank(mu, ties.method = 'random')
    if (xlab %>% is.null) xlab <- expression('Rank'*(mu))
    if (ylab %>% is.null) ylab <- expression(sigma)
  } else if (trans == 'log') {
    sigma <- log2(sigma)
    prior <- log2(prior)
    if (xlab %>% is.null) xlab <- expression(mu)
    if (ylab %>% is.null) ylab <- expression('log'[2]*(sigma))
  } else if (trans == 'sqrt') {
    sigma <- sqrt(sigma)
    prior <- sqrt(prior)
    if (xlab %>% is.null) xlab <- expression(mu)
    if (ylab %>% is.null) ylab <- expression(sqrt(sigma))
  }
  lo <- lowess(mu, sigma, f = span)              # Fit LOWESS curve
  df <- tibble(Probe = rownames(dat),
                  Mu = mu,
               Sigma = sigma,
               Prior = prior,
             Outlier = outliers) %>%
    arrange(Mu) %>%
    mutate(lfit = lo[['y']])

  # Build plot
  if (size %>% is.null) {
    size <- pt_size(df)
  }
  if (alpha %>% is.null) {
    alpha <- pt_alpha(df)
  }
  p <- ggplot(df) +
    geom_path(aes(Mu, lfit, color = 'LOWESS'), size = 0.5) +
    labs(title = title, x = xlab, y = ylab) +
    theme_bw() +
    theme(plot.title = element_text(hjust = 0.5))
  if (any(df$Outlier)) {                         # Plot outliers
    suppressWarnings(
      p <- p + geom_point(aes(Mu, Sigma, text = Probe, color = Outlier),
                          size = size, alpha = alpha,
                          color = c(pal_d3()(4L)[4L], 'black'))
    )
  } else {
    suppressWarnings(
      p <- p + geom_point(aes(Mu, Sigma, text = Probe),
                          size = size, alpha = alpha)
    )
  }
  if (length(dat$s2.prior) == 1L) {              # Plot prior
    p <- p + geom_hline(aes(color = 'Prior', yintercept = Prior))
  } else {
    p <- p + geom_path(aes(Mu, Prior, color = 'Prior'), size = 0.5)
  }
  if (any(df$Outlier)) {
    p <- p + scale_color_manual(name = 'Curves',
                              breaks = c('Prior', 'LOWESS', TRUE),
                              labels = c('Prior', 'LOWESS', 'Outlier'),
                              values = c('black', pal_d3()(4L)[c(1L, 2L, 4L)]))
  } else {
    p <- p + scale_color_manual(name = 'Curves', values = pal_d3()(2L),
                                guide = guide_legend(reverse = TRUE))
  }

  # Output
  gg_out(p, hover, legend)

}


#' @rdname plot_mv
#' @export
#' @importFrom edgeR calcNormFactors estimateTagwiseDisp
#'   estimateGLMTagwiseDisp cpm aveLogCPM

plot_mv.DGEList <- function(
  dat,
  design = NULL,
   trans = 'sqrt',
    span = 0.5,
    size = NULL, 
   alpha = NULL,
   title = 'Mean-Variance Plot',
    xlab = NULL,
    ylab = NULL,
  legend = 'right',
   hover = FALSE
) {

  # Tidy data
  keep <- rowSums(dat$counts) > 1L               # Minimal count filter
  dat <- dat[keep, ]
  nf <- dat$samples$norm.factors                 # Calculate size factors
  if (nf %>% is.null || all(nf == 1L)) {
    dat <- calcNormFactors(dat)
  }
  if (dat$tagwise.dispersion %>% is.null) {
    if (design %>% is.null && !dat$group %>% is.null) {
      design <- model.matrix(~ dat$group)
    }
    if (design %>% is.null) {
      if (dat$common.dispersion %>% is.null) {
        dat <- estimateCommonDisp(dat)
      }
      dat <- estimateTagwiseDisp(dat)
    } else {
      dat <- estimateDisp(dat, design = design)
    }
  }
  mu <- aveLogCPM(dat, prior.count = 1L, dispersion = dat$tagwise.dispersion)
  lcpm <- cpm(dat, log = TRUE, prior.count = 1L)
  sigma <- rowSds(lcpm)
  if (trans == 'rank') {                         # Apply transformations
    mu <- rank(mu, ties.method = 'random')
    if (xlab %>% is.null) xlab <- expression('Rank'*(mu))
    if (ylab %>% is.null) ylab <- expression(sigma)
  } else if (trans == 'log') {
    sigma <- log2(sigma)
    if (xlab %>% is.null) xlab <- expression(mu)
    if (ylab %>% is.null) ylab <- expression('log'[2]*(sigma))
  } else if (trans == 'sqrt') {
    sigma <- sqrt(sigma)
    if (xlab %>% is.null) xlab <- expression(mu)
    if (ylab %>% is.null) ylab <- expression(sqrt(sigma))
  }
  lo <- lowess(mu, sigma, f = span)              # Fit LOWESS curve
  df <- tibble(Probe = rownames(dat),
                  Mu = mu,
               Sigma = sigma) %>%
    arrange(Mu) %>%
    mutate(lfit = lo[['y']])

  # Build plot
  if (size %>% is.null) {
    size <- pt_size(df)
  }
  if (alpha %>% is.null) {
    alpha <- pt_alpha(df)
  }
  suppressWarnings(
    p <- ggplot(df) +
      geom_point(aes(Mu, Sigma, text = Probe), size = size, alpha = alpha) +
      geom_path(aes(Mu, lfit, color = 'LOWESS'), size = 0.5) +
      scale_color_manual(name = 'Curve', values = pal_d3()(1L)) +
      labs(title = title, x = xlab, y = ylab) +
      theme_bw() +
      theme(plot.title = element_text(hjust = 0.5))
  )

  # Output
  gg_out(p, hover, legend)

}


#' @rdname plot_mv
#' @export

plot_mv.DGELM <- function(
  dat,
   trans = 'sqrt',
    span = 0.5,
    size = NULL, 
   alpha = NULL,
   title = 'Mean-Variance Plot',
    xlab = NULL,
    ylab = NULL,
  legend = 'right',
   hover = FALSE
) {

  # Tidy data
  keep <- rowSums(dat$counts) > 1L & !dat$failed      # Minimal count filter
  dat <- dat[keep, ]
  mu <- aveLogCPM(dat, prior.count = 1L, dispersion = dat$dispersion)
  cnts <- cpm(dat, log = TRUE, prior.count = 1L)
  fits <- cpm(dat$fitted.values, log = TRUE, prior.count = 1L)
  resids <- cnts - fits
  sigma <- rowSds(resids)
  if (trans == 'rank') {                              # Apply transformations
    mu <- rank(mu, ties.method = 'random')
    if (xlab %>% is.null) xlab <- expression('Rank'*(mu))
    if (ylab %>% is.null) ylab <- expression(sigma)
  } else if (trans == 'log') {
    sigma <- log2(sigma)
    if (xlab %>% is.null) xlab <- expression(mu)
    if (ylab %>% is.null) ylab <- expression('log'[2]*(sigma))
  } else if (trans == 'sqrt') {
    sigma <- sqrt(sigma)
    if (xlab %>% is.null) xlab <- expression(mu)
    if (ylab %>% is.null) ylab <- expression(sqrt(sigma))
  }
  lo <- lowess(mu, sigma, f = span)                   # Fit LOWESS curve
  df <- tibble(Probe = rownames(dat),
                  Mu = mu,
               Sigma = sigma) %>%
    arrange(Mu) %>%
    mutate(lfit = lo[['y']])

  # Build plot
  if (size %>% is.null) {
    size <- pt_size(df)
  }
  if (alpha %>% is.null) {
    alpha <- pt_alpha(df)
  }
  suppressWarnings(
    p <- ggplot(df) +
      geom_point(aes(Mu, Sigma, text = Probe), size = size, alpha = alpha) +
      geom_path(aes(Mu, lfit, color = 'LOWESS'), size = 0.5) +
      scale_color_manual(name = 'Curve', values = pal_d3()(1L)) +
      labs(title = title, x = xlab, y = ylab) +
      theme_bw() +
      theme(plot.title = element_text(hjust = 0.5))
  )

  # Output
  gg_out(p, hover, legend)

}


#' @rdname plot_mv
#' @export
#' @importFrom edgeR cpm aveLogCPM

plot_mv.DESeqDataSet <- function(
  dat,
   resid = FALSE,
   trans = 'rank',
    span = 0.5,
    size = NULL, 
   alpha = NULL,
   title = 'Mean-Variance Plot',
    xlab = NULL,
    ylab = NULL,
  legend = 'right',
   hover = FALSE
) {

  # Preliminaries
  require(SummarizedExperiment)
  if (resid && assays(dat)[['mu']] %>% is.null) {
    stop('dat must be fit with a negative binomial GLM in order to extract ',
         'residual variance.')
  }

  # Tidy data
  require(DESeq2)
  if (resid) {                                   # Calculate mu, sigma
    keep <- mcols(dat)$baseMean > 0L             # Minimal count filter
    dat <- dat[keep, , drop = FALSE]
    cnts <- counts(dat, normalized = TRUE)
    cnts_lcpm <- cpm(cnts, log = TRUE, prior.count = 1L)
    if (!sizeFactors(dat) %>% is.null) {
      fits <- t(t(assays(dat)[['mu']]) / sizeFactors(dat))
    } else {
      fits <- assays(dat)[['mu']] / normalizationFactors(dat)
    }
    fits_lcpm <- cpm(fits, log = TRUE, prior.count = 1L)
    resids <- cnts_lcpm - fit_lcpm
    mu <- aveLogCPM(cnts, prior.count = 1L, dispersion = dispersions(dat))
    sigma <- rowSds(resids)
  } else {
    if (sizeFactors(dat) %>% is.null && normalizationFactors(dat) %>% is.null) {
      dat <- estimateSizeFactors(dat)
    }
    if (dispersions(dat) %>% is.null) {
      dat <- estimateDispersions(dat, quiet = TRUE)
    }
    keep <- mcols(dat)$baseMean > 0L             # Minimal count filter
    dat <- dat[keep, , drop = FALSE]
    cnts <- counts(dat, normalized = TRUE)
    cnts_lcpm <- cpm(cnts, log = TRUE, prior.count = 1L)
    mu <- aveLogCPM(cnts, prior.count = 1L, dispersion = dispersions(dat))
    sigma <- rowSds(cnts_lcpm)
  }
  if (trans == 'rank') {                         # Apply transformations
    mu <- rank(mu, ties.method = 'random')
    if (xlab %>% is.null) xlab <- expression('Rank'*(mu))
    if (ylab %>% is.null) ylab <- expression(sigma)
  } else if (trans == 'log') {
    sigma <- log2(sigma)
    if (xlab %>% is.null) xlab <- expression(mu)
    if (ylab %>% is.null) ylab <- expression('log'[2]*(sigma))
  } else if (trans == 'sqrt') {
    sigma <- sqrt(sigma)
    if (xlab %>% is.null) xlab <- expression(mu)
    if (ylab %>% is.null) ylab <- expression(sqrt(sigma))
  }
  lo <- lowess(mu, sigma, f = span)              # Fit LOWESS curve
  df <- tibble(Probe = rownames(dat),
                  Mu = mu,
               Sigma = sigma) %>%
    arrange(Mu) %>%
    mutate(lfit = lo[['y']])

  # Build plot
  if (size %>% is.null) {
    size <- pt_size(df)
  }
  if (alpha %>% is.null) {
    alpha <- pt_alpha(df)
  }
  suppressWarnings(
    p <- ggplot(df) +
      geom_point(aes(Mu, Sigma, text = Probe), size = size, alpha = alpha) +
      geom_path(aes(Mu, lfit, color = 'LOWESS'), size = 0.5) +
      scale_color_manual(name = 'Curve', values = pal_d3()(1L)) +
      labs(title = title, x = xlab, y = ylab) +
      theme_bw() +
      theme(plot.title = element_text(hjust = 0.5))
  )

  # Output
  gg_out(p, hover, legend)

}


#' @rdname plot_mv
#' @export

plot_mv.DESeqTransform <- function(
  dat,
   trans = 'rank',
    span = 0.5,
    size = NULL, 
   alpha = NULL,
   title = 'Mean-Variance Plot',
    xlab = NULL,
    ylab = NULL,
  legend = 'right',
   hover = FALSE
) {

  # Tidy data
  require(SummarizedExperiment)
  dat <- assay(dat)
  mu <- rowMeans(dat)
  sigma <- rowSds(dat)
  if (trans == 'rank') {                         # Apply transformations
    mu <- rank(mu, ties.method = 'random')
    if (xlab %>% is.null) xlab <- expression('Rank'*(mu))
    if (ylab %>% is.null) ylab <- expression(sigma)
  } else if (trans == 'log') {
    sigma <- log2(sigma)
    if (xlab %>% is.null) xlab <- expression(mu)
    if (ylab %>% is.null) ylab <- expression('log'[2]*(sigma))
  } else if (trans == 'sqrt') {
    sigma <- sqrt(sigma)
    if (xlab %>% is.null) xlab <- expression(mu)
    if (ylab %>% is.null) ylab <- expression(sqrt(sigma))
  }
  lo <- lowess(mu, sigma, f = span)              # Fit LOWESS curve
  df <- tibble(Probe = rownames(dat),
                  Mu = mu,
               Sigma = sigma) %>%
    arrange(Mu) %>%
    mutate(lfit = lo[['y']])

  # Build plot
  if (size %>% is.null) {
    size <- pt_size(df)
  }
  if (alpha %>% is.null) {
    alpha <- pt_alpha(df)
  }
  suppressWarnings(
    p <- ggplot(df) +
      geom_point(aes(Mu, Sigma, text = Probe), size = size, alpha = alpha) +
      geom_path(aes(Mu, lfit, color = 'LOWESS'), size = 0.5) +
      scale_color_manual(name = 'Curve', values = pal_d3()(1L)) +
      labs(title = title, x = xlab, y = ylab) +
      theme_bw() +
      theme(plot.title = element_text(hjust = 0.5))
  )

  # Output
  gg_out(p, hover, legend)

}


#' @rdname plot_mv
#' @export

plot_mv.default <- function(
  dat,
   trans = 'rank',
    span = 0.5,
    size = NULL, 
   alpha = NULL,
   title = 'Mean-Variance Plot',
    xlab = NULL,
    ylab = NULL,
  legend = 'right',
   hover = FALSE
) {

  # Tidy data
  dat <- matrixize(dat)
  mu <- rowMeans(dat)
  sigma <- rowSds(dat)
  if (trans == 'rank') {                         # Apply transformations
    mu <- rank(mu, ties.method = 'random')
    if (xlab %>% is.null) xlab <- expression('Rank'*(mu))
    if (ylab %>% is.null) ylab <- expression(sigma)
  } else if (trans == 'log') {
    sigma <- log2(sigma)
    if (xlab %>% is.null) xlab <- expression(mu)
    if (ylab %>% is.null) ylab <- expression('log'[2]*(sigma))
  } else if (trans == 'sqrt') {
    sigma <- sqrt(sigma)
    if (xlab %>% is.null) xlab <- expression(mu)
    if (ylab %>% is.null) ylab <- expression(sqrt(sigma))
  }
  lo <- lowess(mu, sigma, f = span)              # Fit LOWESS curve
  df <- tibble(Probe = rownames(dat),
                  Mu = mu,
               Sigma = sigma) %>%
    arrange(Mu) %>%
    mutate(lfit = lo[['y']])

  # Build plot
  if (size %>% is.null) {
    size <- pt_size(df)
  }
  if (alpha %>% is.null) {
    alpha <- pt_alpha(df)
  }
  suppressWarnings(
    p <- ggplot(df) +
      geom_point(aes(Mu, Sigma, text = Probe), size = size, alpha = alpha) +
      geom_path(aes(Mu, lfit, color = 'LOWESS'), size = 0.5) +
      scale_color_manual(name = 'Curve', values = pal_d3()(1L)) +
      labs(title = title, x = xlab, y = ylab) +
      theme_bw() +
      theme(plot.title = element_text(hjust = 0.5))
  )

  # Output
  gg_out(p, hover, legend)

}
dswatson/bioplotr documentation built on March 3, 2023, 9:43 p.m.