##
## jamma.R
##
## Targeted R functions as required to create MA-plots Jam-style
##
## Functions included:
## jamma
## centerGeneData
## points2polygonHull
##
## Requires: KernSmooth,sp,rgeos
## Suggests: matrixStats
##
#' jamma: MA-plots for omics data
#'
#' The jamma package creates MA-plots for omics data, and provides
#' important options to handle specific experiment designs and
#' strategies for data quality control.
#'
#' ## Overview of features:
#'
#' * MA-plots can be calculated using the mean or median signal.
#' * Data can be centered using a subset of reference samples.
#' * Data can be centered within groups of samples, useful to
#' assess within-group variability, or within-batch variability.
#' * Ranked MA-plots can be generated to show rank-difference,
#' useful to assess consistency of the rank ordered signal across
#' samples.
#' * Putative technical outliers can be defined using a MAD factor
#' threshold derived from the data itself, to highlight individual
#' samples with much higher variability than expected from biological
#' sources, which often highlight technical failures in upstream protocol.
#'
#' ## Data centering
#'
#' For example, it can be useful to generate MA-plots within
#' biological sample replicates, or even among technical replicates.
#' By this approach, MA-plots can effectively highlight technical
#' outliers, where variability in one sample is measurably
#' higher than that from other comparable samples. A MAD outlier
#' approach is available to identify samples whose median
#' variance is more than X times higher than that across other
#' samples.
#'
#' It is useful to center within sample types, for example brain
#' samples can be centered independently of kidney or liver
#' samples. This approach is especially useful when statistical
#' comparisons are not intended to be applied across brain
#' and kidney for example.
#'
#' In general, it is recommended to use
#' `centerGroups` to center data within meaningful experimental
#' subsets where there are no intended statistical
#' comparisons across these subsets.
#' We find it useful to generate MA-plots across all samples
#' even when there are distinct experimental subsets, because it provides
#' context to the signal profiles obtained overall.
#' For example it may be informative to recognize that signal from one
#' experimental subset is lower and/or more variable than signal
#' from another subset. It could be of biological or technical
#' importance.
#'
#' ## Data Normalization
#'
#' Lastly, the MA-plot approach is often effective at visualizing
#' the need for data normalization, which is equivalent to methods
#' such as log-ratio normalization. The underlying assumption is that
#' the median or mean log ratio (y-axis difference shown on MA-plots)
#' is zero.
#'
#' A normalization method `jammanorm()` provides this normalization.
#' Note that it also abides by the `centerGroups` and `controlSamples`
#' arguments. Additional argument `controlGenes` optionally defines
#' a specific subset of genes as normalizers, equivalent to using
#' housekeeper genes for normalization. Note that housekeeper normalization
#' in this case is defined by housekeeper genes having log ratio of zero,
#' and does not directly use the geometric mean expression of housekeepers,
#' although the result is very often nearly identical.
#'
#' ## Volcano plots
#'
#' Volcano plots are similar to MA-plots, with some useful distinctions:
#' * Volcano plots display **group** log fold change results versus P-value,
#' based upon a statistical test.
#' * MA-plots display **per-sample** log differences
#' from control, versus the mean signal. Often the P-value is related to
#' the mean signal, therefore these plots have some resemblance.
#' * It is possible to show **group** MA-plots, notably `DESeq2::plotMA()`,
#' although its purpose is to display grouped summary to indicate
#' the effect of signal on the fold change threshold for statistical
#' significance. It is not intended to assess consistent signal across
#' individual samples.
#'
#' @family jamma package
#'
#' ## Core functions:
#'
#' * `jammaplot()`
#' * `centerGeneData()`
#' * `jammanorm()`
#' * `jammacalc()`
#' * `volcano_plot()`
#'
#' ## Additional plot functions:
#' * `points2polygonHull()`
#' * `outer_legend()`
#'
#' @docType package
#' @name jamma
NULL
#' Produce MA-plot of omics data.
#'
#' Produce MA-plot of omics data, where `jammaplot()` uses base R graphics,
#' `ggjammaplot()` uses ggplot2 graphics.
#'
#' `jammaplot` takes a numeric matrix, typically of gene expression data,
#' and produces an MA-plot (Bland-Altman plot), also known as a
#' median-difference plot. One panel is created for each column of
#' data. Within each panel, the x-axis represents the mean or median
#' expression of each row; the y-axis represents the difference from
#' mean or median for that column.
#'
#' By default, the plot uses `jamba::plotSmoothScatter()`, with optional
#' highlighted points draw using `points()`.
#'
#' The function will determine an appropriate layout of plot panels,
#' which can be overridden using `ncol` and `nrow` to specify the
#' number of columns and rows of plot panels, respectively. For now,
#' this function uses base R graphics instead of ggplot2, in order
#' to accomodate some custom features.
#'
#' This function uses "useRaster=TRUE" by default, which causes
#' `jamba::plotSmoothScatter()` to render a rasterized image as opposed
#' to a composite of colored rectangles. This process substantially
#' reduces the render time in all cases, and reduces the image size
#' when saving as PDF or SVG.
#'
#' ## Notable features
#'
#' ### Highlighting points
#'
#' Specific points can be highlighted with argument `highlightPoints`
#' which can be a vector or named list of vectors, containing `rownames(x)`.
#' When using a list, point colors are assigned to each element in the
#' list in order, using the argument `highlightColor`.
#'
#' ### Centering by control samples
#'
#' Typical MA-plots are "global-centered", which calculates the
#' mean/median across all columns in `x`, and this value is subtracted
#' from each individual value per row.
#'
#' By specifying `controlSamples`
#' the mean/median is calculated using only the `colnames(x)` which match
#' `controlSamples`, thus representing "difference from control."
#'
#' It may also be useful to center data by known high-quality samples,
#' so the effect of potential outlier samples is avoided.
#'
#' ### Centering within subgroups
#'
#' By specifying `centerGroups` as a vector of group names,
#' the centering is calculated within each group of `colnames(x)`.
#' In this way, subsets of samples can be treated independently in
#' the MA-plots. A good example might be producing MA-plots for
#' "kidney" samples, and "muscle" samples, which may have
#' fundamentally different signal distributions. A good rule
#' of thumb is to apply `centerGroups` to represent separate
#' groups of samples where you do not intend to apply direct
#' statistical comparisons across those samples, without at
#' least applying a two-way contrast, a fold change of fold
#' changes.
#'
#' Another informative technique is to center by sample group,
#' for example `centerGroups=sample_group`.
#' This technique produces MA-plots that depict the
#' "difference from group" for each sample replicate of a sample
#' group, and is very useful for identifying sample replicates
#' with markedly higher variability to its sample group than
#' others. In general, the variability within sample group
#' should be substantially lower than variability across
#' sample groups. Use `displayMAD=TRUE` and `outlierMAD=2`
#' as a recommended starting point for this technique.
#'
#' ### Applying a noise floor
#'
#' The argument `noise_floor` provides a numeric lower threshold,
#' where individual values **at or below** this threshold are
#' set to a defined value, defined by argument `noise_floor_value`.
#' The default was updated in version `0.0.21.900` to
#' `noise_floor=0` and `noise_floor_value=NA`.
#' Values of zero `0` are set to `NA` and therefore are not included
#' in the MA-plot calculations. Only points above zero are included
#' as points in each MA-plot panel.
#'
#' Another useful alternative is to define `noise_floor_value=noise_floor`
#' which sets any measurement **at or below** the `noise_floor` to
#' this value. This option has the effect of reducing random noise from
#' points that are already below the noise threshold and therefore
#' are unreliable for this purpose.
#'
#' ### Customizing the panel layout
#'
#' Panels are drawn using the order of `colnames(x)` by row,
#' from left-to-right, then top-to-bottom.
#' The argument `blankPlotPos` is intended to insert an empty panel
#' at a particular panel position, to help customize the alignment
#' of sample panels.
#' This option is typically used with `ncol` and `nrow` to define
#' a fixed layout of panel columns and rows. `blankPlotPos` refers
#' to panels numbered as drawn per row of panels,
#'
#' ### Identifying potential sample outliers
#'
#' Use argument `displayMAD=TRUE` to display the per-sample MAD factor
#' relative to its `centerGroups` value, if provided. The MAD value
#' for each MA-plot panel is calculated using rows whose mean
#' is at or above `outlierRowMin`. The median MAD value is calculated
#' for each `centerGroups` grouping when `groupedMAD=TRUE`, by default.
#' Finally, each MA-plot panel MAD factor is the ratio of its MAD value
#' to the relevant median MAD value. MA-plot panels with MAD factor
#' above `outlierMAD` are considered outliers, and the color ramp
#' uses `outlierColramp` or `outlierColor` as a visual cue.
#'
#' Putative outlier samples should usually not be determined
#' when:
#'
#' * `controlSamples` are defined to include only a subset
#' of sample groups,
#' * `centerGroups` is not defined, or represents more than one
#' set of sample groups that are not intended to be statistically
#' compared directly to one another.
#'
#' Putative outlier samples may be defined when:
#'
#' * `centerGroups` represents a set of sample groups that are
#' intended to be involved in direct comparisons
#' * `centerGroups` represents each sample group
#'
#' Potential sample outliers may be identified by setting a threshold
#' with `outlierMAD`, by default 5xMAD. For a sample to be considered
#' an outlier, its median difference from mean/median needs to be
#' five times higher than the median across samples.
#'
#' We typically recommend an `outlierMAD=2` when centering
#' by sample groups, or when centering within experiment subsets.
#' For one sample to have 2xMAD factor, its variance needs
#' to be uniquely twice as high as the majority of other samples, which
#' is typically symptomatic of possible technical failure.
#'
#' There are exceptions to this suggested guideline, which includes
#' scenarios where a batch effect may be involved.
#'
#'
#' To do:
#' \itemize{
#' \item{Accept other object types as input, including Bioconductor
#' classes: `ExpressionSet`, `SummarizedExperiment`,
#' `MultiExperimentSet`}
#' \item{Make it efficient to convey group information, for example
#' define `titleBoxColor` with group colors, allow `centerByGroup=TRUE`
#' which would re-use known sample group information.}
#' \item{Adjust the suffix to indicate when \code{centerGroups} are being
#' used. For example indicate `'sampleID vs groupA'` instead of
#' `'sampleID vs median'`.}
#' }
#'
#' @param x `numeric` object usually a `matrix` that contains
#' values with measurement rows, and sample/observation columns.
#' For example, with gene or protein expression data, the genes
#' or proteins (or the assays of genes or proteins) are
#' represented in rows, and obtained samples are represented
#' in columns. Alternatively `x` can be `SummarizedExperiment`
#' object, used alongside argument `assay_name`.
#' @param assay_name `character` used when `x` is a
#' `SummarizedExperiment` object, to determine which assay
#' `matrix` to use for the MA plots. When `assay_name=NULL`
#' the first assay entry is used, for example `assays(x)[[1]]`.
#' @param maintitle `character` string with the title displayed above
#' all individual MA-plot panels. It will appear in the top outer
#' margin.
#' @param titleBoxColor,subtitleBoxColor `character` vector of
#' R colors used as background color for each panel title text,
#' or subtitle text respectively. The subtitle appears in the
#' bottom-left corner, and usually indicates the center groups
#' as defined by `centerGroups`.
#' @param centerGroups `character` vector of groups passed to
#' `jamma::centerGeneData()` which determines how data is centered.
#' Each group is centered independently, to enable visual
#' comparisons within each relevant centering group.
#' It is useful to center within batches or within
#' subsets of samples that are not intended to be compared to
#' one another.
#' Another useful alternative is to center by each sample
#' group in order to view the variability among group
#' replicates, which should be much lower than variability across
#' sample groups. See `centerGeneData()` for more specific examples.
#' @param controlSamples `character` vector of `colnames(x)` passed
#' to `centerGeneData()` which defines the control samples during
#' the data centering step.
#' By default, and the most common practice, MA-plots are
#' calculated across all samples, which effectively uses all
#' `colnames(x)` as `controlSamples`.
#' However, it is quite useful sometimes to provide a subset of
#' samples especially if there are known quality samples, to which
#' new samples of unknown quality are being compared.
#' @param colramp one of several inputs recognized by
#' `jamba::getColorRamp()`. It typically recognizes either the name of
#' a color ramp from RColorBrewer, the name of functions from the
#' `viridis` package such as `viridis::viridis()`, or single R colors, or
#' a vector of R colors. When a single color is supplied, a gradient
#' is created from white to that color, where the default base color
#' can be customized with `defaultBaseColor="black"` for example.
#' @param colrampOutlier one of several inputs recognized by
#' `jamba::getColorRamp()` to define a specific color ramp for
#' MA-plot outlier panels, used when `outlierMAD` is defined.
#' When `colrampOutlier` is `NULL` the `outlierColor` is used.
#' @param outlierColor `character` string representing one R color,
#' used when `colrampOutlier` is `NULL` and when `outlierMAD` is
#' defined. This color is used for MA-plot outlier panels by
#' substituting the first color from the `colramp` color ramp,
#' to act as a visual cue that the panel represents an outlier.
#' @param applyRangeCeiling `logical` passed to
#' `jamba::plotSmoothScatter()` which determines how to handle points
#' outside the plot x-axis and y-axis range: `applyRangeCeiling=TRUE`
#' will place points at the border of the plot, which is helpful
#' to indicate that there are more points outside the viewing range;
#' `applyRangeCeiling=FALSE` will crop and remove points outside
#' the viewing range, which is helpful for example when a large
#' number of points are at zero and overwhelm the point density.
#' When there are a large proportion of values at zero, it
#' can be helpful to apply `xlim=c(0.01, 20)` and
#' `applyRangeCeiling=FALSE`.
#' @param whichSamples `NULL` or `integer` vector, representing
#' an index subset of samples to include in the MA-plots. When
#' `whichSamples` represents a subset of samples in `x`, the
#' MA-plot calculations are performed on all samples, then only
#' samples in `whichSamples` are displayed. This argument keeps
#' the MA-plot calculations consistent even when viewing only
#' one or a subset of samples in more detail.
#' @param maintitleCex `numeric` cex character expansion used to
#' resize the `maintitle`.
#' @param subtitle `NULL` or `character` vector to be drawn at
#' the bottom left corner of each plot panel, the location
#' is defined by `subtitlePreset`.
#' @param subtitlePreset `character` string passed to `jamba::coordPresets()`.
#' The default `subtitlePreset="bottomleft"` defines the bottom-left
#' corner of each panel.
#' @param subtitleAdjPreset `character` string passed to `jamba::coordPresets()`.
#' The default `subtitleAdjPreset="topright"` places labels to the
#' top-right of the subtitle position, which by default is the bottom-left
#' corner of each panel.
#' @param titleColor `character` vector of colors applied to title text
#' in each MA-plot panel. When `doTitleBox=TRUE` and `titleColor`
#' contains only one or no value, the title color is defined by
#' `jamba::setTextContrastColor()` along with `titleBoxColor`.
#' @param doTitleBox `logical` indicating whether to draw plot
#' titles using a colored box. When `doTitleBox=TRUE` the
#' `jamba::drawLabels()` is called to display a label box at
#' the top of each plot panel, with `drawBox=TRUE`. When
#' `doTitleBox=FALSE`, `jamba::drawLabels()` is called with
#' `drawBox=FALSE`.
#' @param titleFont `integer` font compatible with `par("font")`.
#' Values are recycled across panels, so each panel can use a custom
#' value if needed.
#' @param titlePreset `character` string passed to `jamba::coordPresets()`.
#' The default `titlePreset="top"` defines the top edge of each panel,
#' and default `titleAdjPreset="top"` places the title at the top
#' edge of this position.
#' @param titleAdjPreset `character` string passed to `jamba::coordPresets()`.
#' The default `titleAdjPreset="top"` places labels above the `titlePreset`
#' location, by default above the top edge of each panel.
#' To place the title inside (below) the top edge of the plot panel,
#' use `titlePreset="top", titleAdjPreset="bottom"`, however the
#' title box may overlap and hide data points in the plot panel.
#' @param xlab,ylab `character` x- and y-axis labels, respectively.
#' The default values are blank `""` because there are a wide variety
#' of possible labels, and the labels take up more space
#' than is often useful for most MA-plots.
#' @param xlabline,ylabline `numeric` number indicating the text line
#' distance from the edge of plot border to place `xlab` and `ylab`
#' text, as used by `graphics::title()`.
#' @param groupSuffix (deprecated) `character` text appended to each MA-plot
#' panel title. Use argment `subtitle` as the preferred alternative.
#' @param highlightPoints optional set of rows to highlight on each
#' MA-plot panel, drawn as a set of points on top of the in one of the following forms:
#' * `character` vector matching `rownames(x)`, indicating a single
#' set of points to highlight in one color defined in `highlightColor`.
#' Internally, it is converted to a `list` containing one vector.
#' * `list` of `character` vectors, each containing `rownames(x)`.
#' Each vector is highlighted as a set, using colors defined in
#' `highlightColor` which ideally should have length equal to
#' `length(highlightPoints)`.
#' * `NULL` no points are highlighted.
#' @param highlightColor `character` vector used when `highlightPoints`
#' is defined. It is recycled to `length(highlightPoints)` after
#' `highlightPoints` is converted to a `list` if necessary.
#' Colors are therefore applied to each set of points in the `list`.
#' @param highlightCex `numeric` vector used when `highlightPoints`
#' is defined. It is recycled to `length(highlightPoints)` after
#' `highlightPoints` is converted to a `list` if necessary.
#' Values are therefore applied to each set of points in the `list`.
#' It can be supplied as a `list`, which will be recycled to
#' `length(highlightPoints)` and applied in order to each `vector`
#' of points highlighted.
#' @param doHighlightPolygon `logical` indicating whether to draw
#' a shaded polygon encompassing `highlightPoints`, using each
#' `highlightColor`.The polygon is defined by `grDevices::chull()`
#' via the function `points2polygonHull()`.
#' @param highlightPolygonAlpha `numeric` value indicating alpha
#' transparency used for the highlight polygon when
#' `doHighlightPolygon=TRUE`, where `0` is fully transparent,
#' and `1` is completely not transparent (opaque).
#' @param doHighlightLegend `logical` indicating whether to print a
#' color legend when `highlightPoints` is defined. The legend is
#' displayed in the bottom outer margin of the page using
#' `outer_legend()`, and the page is adjusted to add bottom
#' outer margin. When the legend is particularly large, it may
#' be preferable to hide the color legend and display the legend
#' another way, for example `legend()`.
#' @param smoothPtCol `color` used to draw points when `nrpoints` is
#' non-zero, which draws points in the extremities of the
#' smooth scatter plot. See `jamba::plotSmoothScatter()`.
#' The effect can also be achieved by adjusting `transFactor` to
#' a lower value, which increases the visual contrast of individual
#' points in the point density.
#' @param margins `numeric` vector of margins compatible with
#' `graphics::par("mar")`.
#' Default values are provided here for convenience.
#' Since version `0.0.31.900` the `outer_margins` are used to control
#' y-axis label whitespace, and and y-axis labels are not displayed
#' on every plot panel when `doPar=TRUE`.
#' @param outer_margins `numeric` vector of margins compatible with
#' `graphics::par("oma")`.
#' Default values are provided here for convenience.
#' The left outer marging is used to allow whitespace to display
#' y-axis labels.
#' Note when `useRank=TRUE` the y-axis whitespace is increased
#' to accomodate larger integer label values.
#' @param useRaster `logical` indicating whether to draw the
#' smooth scatter plot using raster logic, `useRaster=TRUE` is
#' passed to `jamba::plotSmoothScatter()`. The default `TRUE`
#' creates a much smaller plot object by rendering each plot
#' panel as a single raster image instead of rendering individual
#' colored rectangles. There is no driving reason to use `useRaster=FALSE`
#' except if the rasterization process itself is problematic.
#' @param ncol,nrow `integer` number of MA-plot panel columns and rows
#' passed to `graphics::par("mfrow")` when `doPar=TRUE`. When only one
#' value is supplied, `nrow` or `ncol`, the other value is defined
#' by `ncol(x)` and `blankPlotPos` so all panels can be contained on
#' one page. When `nrow` and `ncol` are defined such that multiple
#' pages are produced, each page will be annotated with `maintitle`
#' and `doHighlightLegend` if relevant.
#' @param doPar `logical` indicating whether to apply
#' `graphics::par("mfrow")` to define MA-plot panel rows and columns.
#' When `doPar=FALSE` each plot panel is rendered without
#' adjusting the `graphics::par("mfrow")` setting, which is appropriate
#' for displaying each plot panel individually.
#' @param las `integer` value `1` or `2` indicating whether axis labels
#' should be parallel or perpendicular to the axes, respectively.
#' @param ylim,xlim `NULL` or `numeric` vector `length=2` indicating
#' the y-axis and x-axis ranges, respectively. The values are useful
#' to define consistent dimensions across all panels. The
#' default `ylim=c(-4, 4)` represents 16-fold up and down range in
#' normal space, assuming the data content has already been log2
#' transformed, and is typically a reasonable starting point for
#' most purposes. Even if numeric values are all between
#' `-1.5` and `1.5`, it is still recommended to keep a range in
#' context of `c(-4, 4)`, or an appropriate and consistent range given
#' the data content, so the data has some visual context.
#' The range `c(-4, 4)` should be adjusted
#' relative to the typical ranges expected for the data.
#' @param xlim `numeric` or `NULL` to define a fixed numeric range for
#' x-axis values. When `xlim=NULL` the ranges are defined by the
#' numeric x-axis values in each `centerGroups` grouping, so that
#' each group can have its own independent x-axis ranges.
#' When there is a large proportion of values with x=0 (or near zero),
#' there are two options to reduce the point density near zero,
#' which can improve visibility of non-zero points:
#' 1. Use `noise_floor=0` and `noise_floor_value=NA`, which replaces
#' values at or below zero with `NA`, thereby hiding these points
#' from each plot panel. Note the `NA` values are also not used during
#' data centering calculations.
#' 2. Define `xlim=c(0.001, 20)` and `applyRangeCeiling=FALSE`, which
#' defines the x-axis minimum slightly above zero, and
#' `applyRangeCeiling=FALSE` does not display points outside the
#' x-axis range at the plot limits, thereby hiding those points.
#' This method does not replace values with `NA`, therefore all
#' non-NA values are used during data centering.
#' @param useMedian `logical` indicates whether to center data
#' using the `median` value, where `useMedian=FALSE` by default.
#' The median is preferred in cases where outliers should not influence
#' centering.
#' The mean is preferred in cases where the data should visualize
#' data in a manner consistent with downstream parametric
#' statistical analysis.
#' When a particular sample represents a technical outlier,
#' one option to visualize data without being skewed by the outlier
#' is to define `controlSamples` to exclude the outlier sample(s).
#' In this way, data centering will be applied using the non-outlier
#' samples as reference.
#' @param useMean (deprecated) `logical`, use `useMedian`. This argument
#' indicates whether to center data using the `mean` value.
#' When `useMean=NULL` the argument `useMedian` is preferred.
#' For backward compatibility, when `useMean` is not `NULL`,
#' then `useMedian` is defined by `useMedian <- !useMean`.
#' @param customFunc optional `function` used instead of the summary
#' function defined by `useMedian` during the data centering step.
#' This function should take a `matrix` of `numeric` values as input,
#' and must return a `numeric` vector length equal to `nrow(x)`.
#' This option is intended to allow custom row statistics,
#' for example geometric mean, or other row summary functions.
#' @param filterNeg (deprecated) `logical` argument, use `noise_floor`.
#' @param filterNA,filterNAreplacement `logical` and `vector` respectively.
#' When `filterNA=TRUE`, all `NA` values are replaced with
#' `filterNAreplacement`.
#' This process is conceptually opposite of `noise_floor` which replaces
#' a `numeric` value with `NA` or another `numeric` value.
#' Instead, `filterNA` is intended to convert `NA` values into
#' a known `numeric` value, typically using an appropriate noise floor
#' value such as zero `0`.
#' Practically speaking, `NA` values should probably
#' be left as `NA` values, so that data centering does not use these values,
#' and so the MA-plot panel does not draw a point when no measurement
#' exists.
#' @param noise_floor,noise_floor_value `numeric` to define a numeric
#' floor, or `NULL` for no numeric floor. Values **at or below**
#' `noise_floor` are set to `noise_floor_value`, intended for two
#' potential uses:
#' 1. Filter out value below a threshold, so they do not affect centering.
#' * This option is valuable to remove zeros when a zero `0` is considered
#' "no measurement observed", typically for count data such as RNA-seq,
#' NanoString, and especially single-cell protocols or other protocols
#' that produce a large number of missing values.
#' * One can typically tell whether input data includes zero `0`
#' values by the presence of characteristic 45-degree angle lines
#' originating from `x=0` angled toward the right. The points along
#' this line are rows with more measurements of zero than non-zero,
#' there this sample has a non-zero value.
#'
#' 2. Set values at a noise floor to the noise floor, to retain the
#' measurement but minimize the effect during centering to the lowest
#' realiable measurement for the platform technology.
#' * This value may be set to a platform noise floor
#' for something like microarray data where the intensity may be
#' unreliable below a threshold; or
#' * for quantitative PCR measurements where cycle threshold (Ct)
#' values may become unreliable, for example above CT=40 or CT=35.
#' Data is often transformed to abundance with `2 ^ (40 - CT)` then
#' log2-transformed for analysis. In this case, to apply a `noise_floor`
#' effective for CT=35, one would use `noise_floor=5`.
#' @param filterFloor,filterFloorReplacement (deprecated) in favor of
#' `noise_floor`, and `noise_floor_replacement` respectively.
#' @param transFactor `numeric` adjustment to the visual density of
#' smooth scatter points. For base R graphics, this argument is
#' passed to `jamba::plotSmoothScatter()`. The argument value is based upon
#' `graphics::smoothScatter()` argument `transformation`, which uses
#' default `function(x)x^0.25`. The `transFactor` is equivalent to the
#' exponential in the form: `function(x)x^transFactor`. Lower values
#' make the point density more visually intense, higher values make the
#' point density less visually intense.
#' @param nrpoints `integer` or `NULL` indicating the number of points
#' to display on the extremity of the smooth scatter density,
#' passed to `jamba::plotSmoothScatter()`.
#' @param smoothScatterFunc `function` used to produce a smooth scatter plot
#' in base R graphics. The default `jamba::plotSmoothScatter()` controls
#' the level of detail in the density calculation, and in the graphical
#' resolution of that density in each plot panel. The custom function
#' should accept argument `transformation` as described in `transFactor`,
#' even if the argument is not used. This argument could be useful to
#' specify customizations not convenient to apply otherwise.
#' @param ablineH,ablineV `numeric` vector indicating position of
#' horizontal and vertical lines in each MA-plot panel.
#' Either argument can be supplied as a `list`, which will be applied
#' to each plot panel in order, which allows specifying specific
#' abline values in individual panels.
#' @param doTxtplot `logical` (not yet implemented in `jamma`),
#' indicating to produce colored ANSI text plot output, for example
#' to a text terminal.
#' @param blankPlotPos `NULL` or `integer` vector indicating
#' plot panel positions to be drawn blank, and therefore skipped.
#' This argument is intended to allow manual placement of plot panels
#' with spacing that reinforces a logical layout.
#' Plot panels are drawn in the exact order of `colnames(x)` received,
#' with panels placed into a number of columns and rows defined
#' with `ncol`, `nrow` or `jamba::decideMfrow()` when `doPar=TRUE`.
#' Blank panel positions are intended to help customize the visual
#' alignment of MA-plot panels. The mechanism is similar to
#' `ggplot2::facet_wrap()` except that blank positions can be manually
#' defined by what makes sense for the experiment design.
#' This method is not particularly user-friendly, but allows fine
#' control over plot panel spacing.
#' @param displayMAD `logical` indicating whether to display each
#' MA-plot panel MAD factor (median absolute deviation). A MAD
#' value for each panel is calculated by taking the median absolute
#' deviation from zero across all points, using points whose mean
#' value is equal or greater than `outlierRowMin`. The overall MAD
#' is defined by the median MAD from the MA-plot panels. The MAD
#' factor is defined as the ratio of each MA-plot panel MAD value
#' to the overall MAD value, and therefore most MAD factor values
#' should be roughly `1`. The overall MAD value is defined by the
#' median across all samples when `groupedMAD=FALSE`, or defined
#' within each `centerGroup` when `groupedMAD=TRUE`. A value
#' with MAD factor 2 is interpreted as a sample whose median
#' deviation from zero is twice as high as the typical sample,
#' which is a reasonably indication that this sample has twice
#' the inherent level of noise compared to other samples. Note
#' that MAD values should be interpreted within sample processing
#' batches if relevant, or within logical experimental units --
#' roughly interpreted to mean sets of samples within which direct
#' statistical comparisons are intended to be applied. For example,
#' gene expression data that include brain and liver samples would
#' probably use `centerGroups` for brain and liver to be centered
#' separately, therefore the MAD factors should be separately
#' calculated for brain and for liver.
#' @param groupedMAD `logical` indicating how the MAD calculation
#' should be performed: `groupedMAD=TRUE` (default) calculates the
#' median absolute deviation (MAD) from y=0 for each sample per `centerGroups`
#' value, then the median MAD value for all samples in each group.
#' The corresponding MAD factor is defined within each group.
#' When `groupedMAD=FALSE`, the median MAD value is calculated
#' across all samples, regardless of `centerGroups` value,
#' from which the MAD factor is defined.
#' @param outlierMAD `numeric` threshold above which a MA-plot panel
#' MAD factor is considered an outlier. When a MA-plot panel is
#' considered an outlier, the `outlierColramp` or `outlierColor`
#' is applied to the panel color ramp to display a visual
#' indication.
#' @param outlierRowMin `numeric` value indicating the minimum mean
#' value as displayed on the MA-plot panel x-axis, in order for
#' the row to be included in MAD calculations. This argument is
#' intended to prevent measurements whose mean value is below
#' a noise threshold from being included, therefore only including
#' points whose mean measurement is above noise and represents
#' "typical" variability.
#' @param fillBackground `logical` currently used for base R graphics
#' output, and passed to `jamba::plotSmoothScatter()`,
#' indicating whether to fill the plot panel using the
#' first color in the color ramp for each MA-plot panel, or when
#' a plot panel is an outlier, it uses `outlierColor`.
#' This argument is useful for all plot panels especially when the
#' `colramp` base color is not white (or otherwise does not match
#' the background of the plot device, for example `colramp="viridis"`).
#' This argument is also used with `outlierColor` to add visual emphasis
#' to plot panels where the MAD factor exceeds `outlierMAD`.
#' @param panel_hook_function optional custom `function` called
#' as a "hook" after each MA-plot panel has been drawn. This function
#' can be used to display custom axis labels, add plot panel visual
#' accents, lines, labels, etc.
#' This `panel_hook_function` is recycled as a `list` to the
#' number of samples `ncol(x)`, which can be used to supply a unique
#' function for each panel. Any element in the `list` with `length=0`
#' or `NA` is skipped for the corresponding MA-plot panel.
#' This function should accept at least two arguments, even if ignored:
#' * `i` - an `integer` indicating the sample to be plotted in order,
#' as defined by `colnames(x)`, and `whichSamples` in the event samples
#' are subsetted or re-ordered with `whichSamples`.
#' * `...` additional arguments passed by `...` into this custom function.
#' * Any arguments of `jammaplot()` are available inside the panel
#' hook function as a by-product of calling this function within
#' the environment of the active `jammaplot()`, therefore any
#' argument values will be available for use inside that function.
#' @param doPlot `logical` indicating whether to create plots. When
#' `doPlot=FALSE` only the MA-plot panel data is returned.
#' @param useRank `logical` indicating whether to create column-wide
#' ranks, then create MA-plots using the rank data. When `useRank=TRUE`
#' the y-axis represents the rank difference from mean, and the
#' x-axis represents the mean rank. Using `useRank=TRUE` is a good
#' method to evaluate whether data can be normalized, or whether
#' data across samples is inherently noisy.
#' @param verbose logical indicating whether to print verbose output.
#' @param ... additional parameters sent to downstream functions,
#' `jamba::plotSmoothScatter()`, `jammacalc()`, `centerGeneData()`.
#'
#' @return `list` of `numeric` `matrix` objects, one for each MA-plot,
#' with colnames `"x"` and `"y"`. This `list` is sufficient input
#' to `jammaplot()` to re-create the full set of MA-plots.
#'
#' @seealso `jamba::plotSmoothScatter()`
#'
#' @aliases MAplot, MVAplot, plotMA, plotMVA, ma.plot, plot.ma
#'
#' @family jam plot functions
#'
#' @examples
#' # Note the example data requires the affydata Bioconductor package
#' if (suppressPackageStartupMessages(require(affydata))) {
#' data(Dilution);
#' edata <- log2(1+exprs(Dilution));
#' jammaplot(edata);
#'
#' jammaplot(edata,
#' whichSamples=c(1, 2));
#'
#' jammaplot(edata,
#' sample_labels=paste("Sample", colnames(edata)));
#'
#' jammaplot(edata,
#' controlIndicator="titlestar");
#'
#' jammaplot(edata,
#' controlIndicator="none");
#'
#' jammaplot(edata,
#' panel_hook_function=function(i,...){box("figure")});
#'
#' jammaplot(edata,
#' useRank=TRUE,
#' maintitle="Rank MA-plots");
#' }
#'
#' @export
jammaplot <- function
(x,
assay_name=NULL,
maintitle=NULL,
titleBoxColor="#DDBB9977",
subtitleBoxColor=titleBoxColor,
centerGroups=NULL,
controlSamples=colnames(x),
controlFloor=NA,
naControlAction=c("row", "floor", "min", "na"),
naControlFloor=0,
controlIndicator=c(
"labelstar",
"titlestar",
"none"),
sample_labels=NULL,
useMedian=FALSE,
useMean=NULL,
ylim=c(-4,4),
xlim=NULL,
highlightPoints=NULL,
outlierMAD=5,
outlierRowMin=5,
displayMAD=FALSE,
groupedMAD=TRUE,
colramp=c("white", "lightblue", "blue", "navy", "orange", "orangered2"),
colrampOutlier=NULL,
outlierColor="lemonchiffon",
whichSamples=NULL,
maintitleCex=1.8,
subtitle=NULL,
subtitlePreset="bottomleft",
subtitleAdjPreset="topright",
titleCexFactor=1,
titleCex=NULL,
doTitleBox=TRUE,
titleColor="black",
titleFont=2,
titlePreset="top",
titleAdjPreset="top",
xlab="",
xlabline=2,
ylab="",
ylabline=1.5,
groupSuffix=NULL,
highlightPch=21,
highlightCex=1.5,
highlightColor="#00AAAA66",
doHighlightPolygon=FALSE,
highlightPolygonAlpha=0.3,
doHighlightLegend=TRUE,
smoothPtCol="#00000055",
margins=c(2.5, 0.5, 2.0, 0.2),
outer_margins=c(0, 1.5, 0, 0.2),
useRaster=TRUE,
ncol=NULL,
nrow=NULL,
doPar=TRUE,
las=2,
groupedX=TRUE,
customFunc=NULL,
filterNA=TRUE,
filterNAreplacement=NA,
filterNeg=FALSE,
noise_floor=0,
noise_floor_value=NA,
filterFloor=NULL,
filterFloorReplacement=NULL,
transFactor=0.18,
nrpoints=0,
smoothScatterFunc=jamba::plotSmoothScatter,
applyRangeCeiling=TRUE,
doTxtplot=FALSE,
ablineV=0,
ablineH=c(-2,0,2),
blankPlotPos=NULL,
fillBackground=TRUE,
useRank=FALSE,
apply_transform_limit=40,
panel_hook_function=NULL,
doPlot=TRUE,
verbose=FALSE,
...)
{
## Purpose is to provide an MA plot, based upon the MVAplotMED from
## the defunct Agi4x44PreProcess package, except in this case using
## smoothScatter since 20k to 40k datapoints is too many to show without
## them being occluded.
##
## You must manually specify the number columns and rows, and set doPar=TRUE
## for this function to define multi-panel plotting, e.g.
## "nrow=5, ncol=4, doPar=TRUE"
## Otherwise, you can manually set these parameters with
## "par('mfrow'=c(5,4));"
##
## This function also uses "useRaster=TRUE" by default which causes the
## imageDefault() function used by smoothScatter() to produce a temporary
## raster image which is then resized with interpolation to produce a
## properly-blended representation of the full image. This process also
## substantially reduces the image size when saving as a vector output
## format like PDF or SVG.
##
## To highlight certain points, e.g. marked outliers, or statistical hits,
## supply rownames into highlightPoints as a vector, or a list of vectors.
## If using a list of vectors, also supply a list of colors in
## highlightColor, so each subset is colored uniquely.
##
## ctrlSamples is deprecated in favor of controlSamples.
##
## controlSamples indicates which samples should be used for centering
## data, by default all samples. However, centering by control samples
## visually indicates changes from those control samples.
##
## centerGroups is used to apply the centering to subsets of columns
## of data, for example if two very different cell types should be separately
## centered for closer scrutiny of the within-cell variability.
## One can take this further, and center each sample group, which has
## the helpful effect of displaying variability of replicates per
## sample group. This method is quite useful for reviewing whether any
## replicates are notably different than counterparts in the same
## sample group.
##
## Note: the "filterNAandNeg" will floor log2 values at 0, which has the
## effect of adding the "45 degree lines" originating from c(0,0).
##
## blankPlotPos is intended for specific plot layouts, to keep the plots
## organized by some fixed criteria, e.g. certain sample groups per
## row or column, but where there may not be consistent replicates per
## group.
## For ncol and nrow, blankPlotPos indicates which panels should be
## left empty, where each panel is numbered by row. For example,
## if ncol=4, then the panels are numbered 1-4, 5-8, 9-12, etc.
##
## outlierMAD is only viable when centerEachGroup=TRUE; in which case
## outlierMAD is a threshold for the MAD factor of outliers. Each MA
## plot panel is given a MAD score, using probes whose x-values are
## greater than outlierRowMin;, the median of the absolute value is
## used as the MAD.
## These MAD values are used to calculate the global MAD, and individual
## MADs are divided by global MAD to determine a MAD factor.
## The outlierMAD is the threshold above which a MAD factor is considered
## an outlier.
##
#if (!suppressPackageStartupMessages(require(jamba))) {
# stop("jammaplot() requires the jamba package.");
#}
#if (suppressPackageStartupMessages(require(matrixStats))) {
# doMS <- TRUE;
if (jamba::check_pkg_installed("matrixStats")) {
doMS <- TRUE;
rowMedians <- matrixStats::rowMedians;
colMedians <- matrixStats::colMedians;
} else {
doMS <- FALSE;
rowMedians <- function(x, na.rm=TRUE) {
apply(x, 1, median, na.rm=na.rm);
}
colMedians <- function(x, na.rm=TRUE) {
apply(x, 2, median, na.rm=na.rm);
}
}
controlIndicator <- match.arg(controlIndicator);
naControlAction <- match.arg(naControlAction);
if (doTxtplot) {
blankPlotPos <- NULL;
smoothScatterFunc <- function(...){
plotTextSmoothScatter(height=20,
width=80,
doLegend=FALSE,
...);
}
}
if (length(useMean) > 0 && is.logical(useMean)) {
useMedian <- !useMean;
if (verbose) {
jamba::printDebug("jammaplot(): ",
"useMedian defined by !useMean, useMedian=",
useMedian);
}
}
if (length(useMedian) == 0) {
useMedian <- FALSE;
}
transformation <- function(x){
x^transFactor;
}
if ("list" %in% class(x)) {
nsamples <- length(x);
x_names <- names(x);
} else if ("SummarizedExperiment" %in% class(x)) {
x <- get_se_assaydata(x,
assay_name=assay_name,
verbose=verbose);
nsamples <- ncol(x);
x_names <- colnames(x);
if (length(x) == 0) {
stop("assays(x)[[assay_name]] did not produce a usable data matrix.");
}
} else {
nsamples <- ncol(x);
x_names <- colnames(x);
}
if (nsamples == 0) {
stop("No samples are available to plot.");
}
# optional hook function for each panel
if (length(panel_hook_function) > 0) {
if (is.function(panel_hook_function)) {
panel_hook_function <- list(panel_hook_function);
}
panel_hook_function <- rep(panel_hook_function,
length.out=nsamples);
}
if (length(x_names) == 0) {
if (length(sample_labels) == nsamples) {
if ("list" %in% class(x)) {
names(x) <- sample_labels;
} else {
colnames(x) <- sample_labels;
}
x_names <- sample_labels;
} else {
x_names <- jamba::makeNames(rep("V", ncol(x)));
if ("list" %in% class(x)) {
names(x) <- x_names;
} else {
colnames(x) <- x_names;
}
}
}
if (length(sample_labels) == 0) {
sample_labels <- x_names;
}
## if colrampOutlier is a single color, use it to replace
## the first color of colramp
if (length(outlierColor) == 0) {
outlierColor <- "palegoldenrod";
}
if (length(colrampOutlier) == 0) {
colrampOutlier <- outlierColor;
}
if (!is.list(colramp)) {
colramp <- list(colramp);
}
if (!is.list(colrampOutlier)) {
colrampOutlier <- list(colrampOutlier);
}
colramp <- rep(colramp,
length.out=length(x_names));
names(colramp) <- x_names;
colramp <- lapply(colramp, jamba::getColorRamp, n=NULL);
colrampOutlier <- rep(colrampOutlier,
length.out=length(x_names));
names(colrampOutlier) <- x_names;
# apply outlier colors to each color ramp
for (i in seq_along(x_names)) {
icolramp <- colramp[[i]];
icolrampOutlier <- colrampOutlier[[i]];
if (!is.function(icolrampOutlier)) {
if (length(icolrampOutlier) == 1 && jamba::isColor(icolrampOutlier)) {
## substitute single color with the first color in colramp
icolrampOutlier <- jamba::getColorRamp(
c(icolrampOutlier,
tail(icolramp(101), -1)),
n=NULL,
...);
} else {
## For multi-color colrampOutlier, use it to define
## the complete outlier color gradient
icolrampOutlier <- jamba::getColorRamp(icolrampOutlier,
n=NULL,
...);
}
colrampOutlier[[i]] <- icolrampOutlier;
}
}
## If groupSuffix is supplied, make sure its length
## is length(x_names)
if (length(groupSuffix) > 0) {
groupSuffix <- rep(groupSuffix,
length.out=length(x_names));
}
## highlightPoints
## Todo: check values in highlightPoints versus rownames(x)
if (length(highlightPoints) == 0) {
highlightColor <- NULL;
} else {
if (!is.list(highlightPoints)) {
if (length(highlightPoints) == length(highlightColor)) {
if (length(names(highlightPoints)) == 0) {
names(highlightPoints) <- makeNames(highlightPoints);
}
highlightPoints <- as.list(highlightPoints);
} else {
highlightPoints <- list(highlighted=highlightPoints);
}
} else {
if (length(names(highlightPoints)) == 0) {
names(highlightPoints) <- makeNames(
rep("highlight",
length.out=length(highlightPoints)));
}
}
if (!is.list(highlightColor)) {
highlightColor <- as.list(highlightColor);
}
highlightColor <- rep(highlightColor,
length.out=length(highlightPoints));
if (length(names(highlightColor)) > 0 &&
all(names(highlightColor) %in% names(highlightPoints))) {
highlightColor <- highlightColor[names(highlightPoints)];
} else {
names(highlightColor) <- names(highlightPoints);
}
if (!is.list(highlightPch)) {
highlightPch <- as.list(highlightPch);
}
highlightPch <- rep(highlightPch,
length.out=length(highlightPoints));
if (length(names(highlightPch)) > 0 &&
all(names(highlightPch) %in% names(highlightPoints))) {
highlightPch <- highlightPch[names(highlightPoints)];
} else {
names(highlightPch) <- names(highlightPoints);
}
if (!c("list") %in% class(highlightCex)) {
highlightCex <- as.list(highlightCex);
}
highlightCex <- rep(highlightCex,
length.out=length(highlightPoints));
if (length(names(highlightCex)) > 0 &&
all(names(highlightCex) %in% names(highlightPoints))) {
highlightCex <- highlightCex[names(highlightPoints)];
} else {
names(highlightCex) <- names(highlightPoints);
}
}
## By default, when subtitle is not supplied, and
## centerGroups contains multiple unique values,
## use centerGroups in subtitle
if (length(subtitle) == 0 &&
length(unique(centerGroups)) > 1) {
subtitle <- rep(centerGroups,
length.out=length(x_names));
names(subtitle) <- x_names;
}
## Define a new par for panel layout
if (doPar) {
oPar <- par(no.readonly=TRUE);
on.exit(par(oPar));
if (is.null(ncol)) {
if (is.null(nrow)) {
parMfrow <- jamba::decideMfrow(nsamples);
nrow <- parMfrow[1];
ncol <- parMfrow[2];
} else {
ncol <- ceiling(nsamples / nrow);
}
} else if (is.null(nrow)) {
nrow <- ceiling(nsamples / ncol);
}
if (length(outer_margins) == 0) {
outer_margins <- 0;
}
outer_margins <- rep(outer_margins, length.out=4);
# if useRank=TRUE add minor margin to left
if (TRUE %in% useRank) {
# margins <- margins + c(0, 2, 0, 0);
outer_margins <- outer_margins + c(0, 2, 0, 0);
}
par("oma"=outer_margins)
par(mfrow=c(nrow, ncol));
if (length(maintitle) > 0) {
maintitle_nlines <- length(unlist(strsplit(maintitle, "\n")));
par("oma"=pmax(par("oma"),
c(0, 0, 1.5 * (maintitle_nlines + 3), 0)));
}
# if highlightPoints, add margin at the bottom
if (length(highlightPoints) > 0 && doHighlightLegend) {
par("oma"=pmax(par("oma"),
c(3, 0, 0, 0)));
}
}
if (length(titleCex) == 0) {
titleCex <- 1;
}
titleCex <- rep(titleCex, length.out=nsamples);
titleFont <- rep(titleFont, length.out=nsamples);
titleBoxColor <- rep(titleBoxColor, length.out=nsamples);
## Adjust titleColor to have contrast from the titleBoxColor
if (length(titleColor) <= 1) {
if (doTitleBox) {
titleColor <- jamba::setTextContrastColor(titleBoxColor);
} else {
if (length(titleColor) == 0) {
titleColor <- "black";
}
titleColor <- rep(titleColor, length.out=nsamples);
}
}
if (doPar) {
par("mar"=margins);
}
if (length(whichSamples) == 0) {
whichSamples <- seq_len(nsamples);
} else {
# 0.0.25.900 updated to accept character,factor and not change numeric
if (any(c("character", "factor") %in% class(whichSamples))) {
whichSamples <- match(as.character(whichSamples),
x_names);
if (any(is.na(whichSamples))) {
if (all(is.na(whichSamples))) {
stop("whichSamples supplied as character/factor does not match any names in x");
}
whichSamples <- whichSamples[!is.na(whichSamples)];
}
}
if (is.numeric(whichSamples)) {
if (any(whichSamples > nsamples)) {
whichSamples <- whichSamples[whichSamples <= nsamples];
}
# whichSamples <- x_names[whichSamples]
}
}
names(whichSamples) <- x_names[whichSamples];
gaveMVA <- FALSE;
if ("data.frame" %in% class(x)) {
x <- as.matrix(x);
}
# Handle noise_floor,noise_floor_value, in place of deprecated arguments
# filterFloor,filterFloorReplacement
deprecate_warn <- character(0);
if (length(filterFloor) > 0) {
deprecate_warn <- c(deprecate_warn, "filterFloor");
noise_floor <- filterFloor;
}
if (length(filterFloorReplacement) > 0) {
deprecate_warn <- c(deprecate_warn, "filterFloorReplacement");
noise_floor_value <- filterFloorReplacement;
}
if ("list" %in% class(x) &&
all(c("x", "y") %in% colnames(x[[1]]))) {
## Re-use MA-plot data as provided
gaveMVA <- TRUE;
if (is.null(xlim)) {
xlim <- range(sapply(x, function(iObj){
range(c(range(iObj[,"x"])));
}));
#xlim <- c(0, xlim1);
}
mvaDatas <- x;
}
## Newer method using jammacalc()
if (verbose) {
jamba::printDebug("jammaplot(): ",
"Calling jammacalc().",
"useMedian:", useMedian);
}
## Optionally apply log2(1 + x) transform?
if (!TRUE %in% useRank &&
any(x > apply_transform_limit)) {
if (verbose) {
jamba::printDebug("ggjammaplot(): ",
"Applied transform due to values above ",
apply_transform_limit, ": ",
"log2(1 + x)");
}
x <- jamba::log2signed(x,
offset=1);
}
mvaDatas <- jammacalc(x=x,
na.rm=TRUE,
useMedian=useMedian,
controlSamples=controlSamples,
centerGroups=centerGroups,
controlFloor=controlFloor,
naControlAction=naControlAction,
naControlFloor=naControlFloor,
groupedX=groupedX,
grouped_mad=groupedMAD,
whichSamples=whichSamples,
noise_floor=noise_floor,
noise_floor_value=noise_floor_value,
naValue=filterNAreplacement,
centerFunc=centerGeneData,
returnType="ma_list",
mad_row_min=outlierRowMin,
useRank=useRank,
verbose=verbose,
...);
# for useRank=TRUE adjust default ylim
if (TRUE %in% useRank && all(abs(ylim) == 4)) {
ylim <- c(-1,1) * round(nrow(x) * .40);
}
## Optionally calculate the MAD per panel
## assumptions: that zero should be the median, all deviations are absolute values since
## we are comparing with zero, and we are not subtracting an actual median value
mvaMADs <- attr(mvaDatas, "MADs");
mvaMADfactors <- attr(mvaDatas, "MADfactors");
mvaMADoutliers <- x_names[whichSamples][which(mvaMADfactors >= outlierMAD)];
if (verbose) {
jamba::printDebug("jammaplot(): ",
"mvaMADs:");
print(format(digits=2, mvaMADs));
jamba::printDebug("jammaplot(): ",
"mvaMADfactors:");
print(format(digits=2, mvaMADfactors));
jamba::printDebug("jammaplot(): ",
"mvaMADoutliers:",
mvaMADoutliers);
}
attr(mvaDatas, "mvaMADs") <- mvaMADs;
attr(mvaDatas, "mvaMADoutliers") <- mvaMADoutliers;
attr(mvaDatas, "mvaMADfactors") <- mvaMADfactors;
attr(mvaDatas, "MADs") <- mvaMADs;
attr(mvaDatas, "MADoutliers") <- mvaMADoutliers;
attr(mvaDatas, "MADfactors") <- mvaMADfactors;
attr(mvaDatas, "outlierMAD") <- outlierMAD;
attr(mvaDatas, "outlierRowMin") <- outlierRowMin;
## check_panel_page() checks if the panel number will cause a new
## page to be created, if maintitle is supplied then it will be
## printed atop each page
check_panel_page <- function
(iPanelNumber,
maintitle,
doHighlightLegend=FALSE,
highlightColor=NULL,
highlightCex=NULL,
highlightPch=NULL,
verbose=FALSE,
...) {
if (length(maintitle) == 0 &&
!(doHighlightLegend && length(highlightColor) > 0)) {
if (verbose) {
jamba::printDebug("check_panel_page(): ",
"No extra title nor color legend is required.");
}
return();
}
max_panels <- prod(par("mfrow"));
if (iPanelNumber > 1 && ((iPanelNumber-1) %% max_panels) == 0) {
if (length(maintitle) > 0) {
## Print maintitle
maintitle_txt <- paste(unlist(strsplit(maintitle, "\n")),
collapse="\n");
title(outer=TRUE,
main=maintitle_txt,
cex.main=maintitleCex);
} else {
if (verbose) {
jamba::printDebug("check_panel_page(): ",
"No title was supplied.");
}
}
if (doHighlightLegend && length(highlightColor) > 0) {
## print color legend
if (verbose) {
jamba::printDebug("highlightColor:");
print(highlightColor);
}
outer_legend(x="bottom",
legend=names(highlightColor),
col=unname(unlist(highlightColor)),
pch=unname(unlist(highlightPch)),
pt.cex=unname(unlist(highlightCex)),
pt.bg=NULL);
} else {
if (verbose) {
jamba::printDebug("check_panel_page(): ",
"No highlightColor was supplied.");
}
}
} else {
if (verbose) {
jamba::printDebug("check_panel_page(): ",
"Panel did not force a page break.");
}
}
}
######################################
## Iterate each plot panel
##
## iPanelNumber keeps track of the numbered panels as they are plotted,
## so we can insert blank panels at the specified positions.
if (doPlot) {
iPanelNumber <- 0;
for(i in whichSamples) {
#mvaData <- mvaDatas[[i]];
mvaData <- mvaDatas[[x_names[i]]];
iPanelNumber <- iPanelNumber + 1;
check_panel_page(iPanelNumber,
maintitle,
doHighlightLegend=doHighlightLegend,
highlightColor=highlightColor,
highlightPch=highlightPch,
highlightCex=highlightCex,
verbose=verbose,
...);
if (verbose) {
jamba::printDebug("iPanelNumber: ", iPanelNumber,
fgText=c("orange", "lightgreen"));
}
if (length(blankPlotPos) > 0) {
while(iPanelNumber %in% blankPlotPos) {
jamba::nullPlot(doBoxes=FALSE);
if (verbose) {
jamba::printDebug(" Inserted a blank panel at position: ",
iPanelNumber);
}
iPanelNumber <- iPanelNumber + 1;
check_panel_page(iPanelNumber,
maintitle,
doHighlightLegend=doHighlightLegend,
highlightColor=highlightColor,
highlightPch=highlightPch,
highlightCex=highlightCex,
verbose=verbose,
...);
if (verbose) {
jamba::printDebug("new iPanelNumber: ",
iPanelNumber);
}
}
}
par("mar"=margins);
groupName <- paste(c(x_names[i],
groupSuffix[i]),
collapse="");
# titleText <- x_names[i];
titleText <- sample_labels[i];
## Calculate the MAD, i.e. the median absolute deviation from zero
mvaMAD <- mvaMADs[whichSamples[i]];
#mvaMAD <- median(abs(mvaData[,"y"]), na.rm=TRUE);
## Determine color ramp per panel
if (length(outlierMAD) > 0 && x_names[i] %in% mvaMADoutliers) {
colrampUse <- colrampOutlier[[i]];
} else {
colrampUse <- colramp[[i]];
}
# version 0.0.33.900: Define missing xlim when required
if (length(jamba::rmNA(xlim)) == 0) {
if (length(jamba::rmNA(mvaData[,"x"])) == 0) {
use_xlim <- NULL;
} else {
use_xlim <- range(mvaData[,"x"], na.rm=TRUE)
}
} else {
use_xlim <- xlim
}
if (length(mvaData) == 0 ||
nrow(mvaData) == 0 ||
all(is.na(mvaData[,"y"]))) {
jamba::nullPlot(doBoxes=FALSE,
xlim=use_xlim,
ylim=ylim
);
jamba::usrBox(fill=outlierColor)
box();
} else {
mva <- smoothScatterFunc(mvaData[,c("x","y")],
colramp=colrampUse(21),
xlab="",
ylab="",
las=las,
xlim=use_xlim,
ylim=ylim,
yaxt="n",
transformation=transformation,
#col=smoothPtCol,
useRaster=useRaster,
nrpoints=nrpoints,
fillBackground=fillBackground,
applyRangeCeiling=applyRangeCeiling,
...);
# display y-axis labels only first plot each row
y_axis_at <- pretty(ylim);
if (TRUE %in% doPar && par("mfg")[2] != 1) {
y_axis_labels <- rep("", length.out=length(y_axis_at));
} else {
y_axis_labels <- format(y_axis_at,
trim=TRUE,
big.mark=",")
}
axis(2,
las=2,
at=y_axis_at,
labels=y_axis_labels,
...)
}
## Add axis labels
title(xlab=xlab,
line=xlabline);
title(ylab=ylab,
line=ylabline);
## Add some axis lines across the plot for easy visual reference
if (!is.null(ablineH)) {
if ("list" %in% class(ablineH)) {
if (all(names(ablineH) %in% x_names)) {
h <- ablineH[[x_names[i]]];
} else {
h <- ablineH[[i]];
}
} else {
h <- unique(unlist(ablineH));
}
h <- h[h >= min(ylim, na.rm=TRUE) & h <= max(ylim, na.rm=TRUE)];
if (length(h) > 0) {
abline(h=h,
col="#44444488",
lty="dashed",
lwd=1,
...);
}
}
## Add some axis lines across the plot for easy visual reference
if (!is.null(ablineV)) {
if ("list" %in% class(ablineV)) {
if (all(names(ablineV) %in% x_names)) {
v <- ablineV[[x_names[i]]];
} else {
v <- ablineV[[i]];
}
} else {
v <- unique(unlist(ablineV));
}
if (length(jamba::rmNA(use_xlim)) > 0) {
v <- v[v >= min(use_xlim, na.rm=TRUE) &
v <= max(use_xlim, na.rm=TRUE)];
if (length(v) > 0) {
abline(v=v,
col="#44444488",
lty="dashed",
lwd=1,
...);
}
}
}
## Optionally highlight a subset of points
if (length(highlightPoints) > 0) {
hp1 <- lapply(seq_along(highlightPoints), function(highI){
highP <- highlightPoints[[highI]];
hiData <- mvaData[which(rownames(mvaData) %in% highP),,drop=FALSE];
## Make sure to restrict the y-values to fit within the axis limits,
## consistent with smoothScatterFunc().
yValues <- jamba::noiseFloor(hiData[,"y"],
minimum=min(ylim),
ceiling=max(ylim),
...);
## Optionally draw a polygon hull around highlighted points.
if (doHighlightPolygon && length(yValues) > 2) {
hiM <- cbind(x=hiData[,"x"],
y=yValues);
hiHull <- hiM[grDevices::chull(hiM),,drop=FALSE];
#hiHull <- points2polygonHull(data.frame(x=hiData[,"x"],
# y=yValues),
# returnClass="matrix");
if (length(highlightPolygonAlpha) == 0) {
highlightPolyAlpha <- jamba::col2alpha(
highlightColor[[highI]]) / 2;
}
highlightPolyColor <- jamba::alpha2col(
highlightColor[[highI]],
alpha=highlightPolyAlpha);
polygon(hiHull,
add=TRUE,
col=highlightPolyColor,
border=highlightPolyColor);
}
## Draw the highlighted points
points(x=hiData[,"x"],
y=yValues,
pch=highlightPch[[highI]],
bg=highlightColor[[highI]],
col=jamba::makeColorDarker(darkFactor=1.2,
sFactor=1.5,
highlightColor[[highI]]),
xpd=FALSE,
cex=highlightCex[[highI]]);
});
}
## Optionally draw title boxes as labels per plot
if (verbose) {
jamba::printDebug("jammaplot(): ",
"titleFont:",
titleFont);
}
titleBoxTextColor <- titleColor[i];
## Optionally indicate control with asterisk in the plot title
if (x_names[i] %in% controlSamples &&
"titlestar" %in% controlIndicator) {
titleText <- paste0(titleText,
"*",
groupSuffix[i]);
} else {
titleText <- paste0(titleText,
groupSuffix[i]);
}
if (doTitleBox) {
if (verbose) {
jamba::printDebug("jammaplot(): ",
" titleBoxColor:",
titleBoxColor[i],
bgText=list(NA, titleBoxColor[i]),
fgText=list("orange", titleColor[i]));
}
parXpd <- par("xpd");
par("xpd"=NA);
## Use drawLabels() for more control over the text box
jamba::drawLabels(preset=titlePreset,
adjPreset=titleAdjPreset,
panelWidth="minimum",
txt=titleText,
boxColor=titleBoxColor[i],
boxBorderColor=jamba::makeColorDarker(titleBoxColor[i]),
labelCol=titleColor[i],
labelCex=titleCex[i],
drawBox=doTitleBox,
font=titleFont[i],
...);
par("xpd"=parXpd);
} else {
titleLine <- margins[3] - 1.5;
title(main="",
sub=subtitle,
cex.sub=titleCex[i],
cex.main=titleCex[i],
line=titleLine,
col.main=titleBoxTextColor,
font.main=titleFont[i],
...);
title(main=titleText,
cex.main=titleCex[i],
line=titleLine-1,
col.main=titleBoxTextColor,
font.main=titleFont[i],
...);
}
## Subtitle with centered label
if (length(subtitle) > 0) {
## Use drawLabels() for more control over the text box
jamba::drawLabels(preset=subtitlePreset,
adjPreset=subtitleAdjPreset,
txt=subtitle[i],
boxColor=subtitleBoxColor[i],
boxBorderColor=jamba::makeColorDarker(subtitleBoxColor[i]),
labelCol=jamba::setTextContrastColor(subtitleBoxColor[i]),
labelCex=titleCex[i]*0.9,
drawBox=TRUE,
font=titleFont[i],
...);
}
## Optionally print control indicator inside the plot panel
if (x_names[i] %in% controlSamples &&
"labelstar" %in% controlIndicator) {
preset_star <- "topleft";
adjPreset_star <- "bottomright";
if (doTitleBox) {
if (grepl("bottom", titleAdjPreset)) {
adjPreset_star <- gsub("bottom",
"top",
adjPreset_star)
}
}
jamba::drawLabels(preset=preset_star,
adjPreset=adjPreset_star,
panelWidth="default",
txt=" * ",
drawBox=FALSE,
labelCol="black",
labelCex=titleCex[i]*1.5,
font=titleFont[i],
...);
}
## Optionally print the MAD factor in the bottom right corner
panelCol <- head(colrampUse(51), 1);
labelBaseCol <- jamba::setTextContrastColor(panelCol);
labelBaseColL <- jamba::col2hcl(labelBaseCol)["L",];
labelCol <- ifelse(
mvaMADfactors[whichSamples[i]] >= outlierMAD,
ifelse(labelBaseColL > 50,
"gold",
"red3"),
labelBaseCol);
if (length(outlierMAD) > 0 && displayMAD == 1) {
jamba::drawLabels(preset="bottomright",
labelCex=titleCex[i]*0.9,
font=titleFont[i],
txt=paste0("MAD x",
format(digits=2,
mvaMADfactors[whichSamples[i]])
),
drawBox=FALSE,
labelCol=labelCol
);
} else if (length(outlierMAD) > 0 && displayMAD == 2) {
jamba::drawLabels(preset="bottomright",
labelCex=titleCex[i]*0.9,
font=titleFont[i],
txt=paste0("MAD:",
format(digits=2,
mvaMADs[whichSamples[i]])),
drawBox=FALSE,
labelCol=labelCol
);
}
#mvaData;
## Optional per-panel hook function
if (length(panel_hook_function) > 0 &&
length(panel_hook_function[[i]]) > 0 &&
is.function(panel_hook_function[[i]])) {
panel_hook_function[[i]](i,
...)
}
}
## End of the per-panel MVA plot loop
check_panel_page(iPanelNumber=prod(par("mfrow")) + 1,
maintitle,
doHighlightLegend=doHighlightLegend,
highlightColor=highlightColor,
highlightPch=highlightPch,
highlightCex=highlightCex,
verbose=verbose,
...);
}
invisible(mvaDatas);
}
#' define a polygon hull around points
#'
#' Define a polygon hull to encompass a set of data points
#'
#' A polygon hull can be useful as a visual indicator of the location
#' of a set of points. This function creates a polygon around all points,
#' and not just the most dense region of points, using
#' `grDevices::chull()`.
#'
#' The output is slightly modified to duplicate the first coordinate,
#' in order to "close" the resulting polygon, which is consistent
#' with the output of `rgeos::gConvexHull()`.
#'
#' @param x two-column numeric matrix of data points, or an object of
#' class `sp::SpatialPoints`.
#' @param returnClass character string indicating the class to return,
#' either `"matrix"` to return a two-column numeric matrix, or
#' `"SpatialPolygons"` to return `sp::SpatialPolygons`.
#' @param verbose logical whether to print verbose output.
#'
#' @return
#' A two-column numeric matrix of polygon coordinates when
#' `returnClass="matrix"` by default, or when
#' `returnClass="SpatialPolygons"` it returns an object
#' of class `sp::SpatialPolygons`.
#'
#' @family jam utility functions
#'
#' @examples
#' set.seed(123);
#' xy <- cbind(x=1:10,
#' y=sample(1:10, 10));
#' xyhull <- points2polygonHull(xy);
#' plot(xy, pch=20, cex=1, col="purple4",
#' main="Polygon hull around points");
#' polygon(xyhull, border="purple2", col="transparent");
#'
#' @export
points2polygonHull <- function
(x,
returnClass=c("matrix", "SpatialPolygons"),
...)
{
## Purpose is to take coordinates of a set of points, and return
## a polygon which includes all points inside it.
## If there are 3 or fewer points, the original points are returned.
##
## It will accept SpatialPoints or a two-column matrix,data.frame,tbl
## of x,y coordinates as input.
##
## It will return either a matrix of x,y coordinates, or SpatialPolygons
## containing one polygon (suitable for plotting directly) as output.
##
returnClass <- match.arg(returnClass);
x_class <- class(x);
if (jamba::igrepHas("matrix|data.*frame|tbl", x_class)) {
if (ncol(x) != 2) {
stop(paste0("points2polygonHull() requires x with class",
" matrix, data.frame, tibble, tbl",
" to contain 2 columns."));
}
} else if (jamba::igrepHas("SpatialPoints", x_class)) {
x <- SpatialPoints(x)@coords;
}
## Determine convex hull coordinates
x_hull <- x[grDevices::chull(x[,1:2]),,drop=FALSE];
if (nrow(x_hull) > 2) {
## Add final row to "close" the polygon
x_hull <- rbind(x_hull, x_hull[1,,drop=FALSE]);
}
if ("SpatialPolygons" %in% returnClass) {
spp <- SpatialPolygons(list(
Polygons(list(
Polygon(x_hull)),
"hull"
)
));
return(spp);
}
return(x_hull);
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.