DONE. Fix ggjammaplot()
bug where ggplot2 facet strip background color
no longer applies titleBoxColor
properly. Unclear when this stopped
working, probably something in the ggplot2 3.5.x update.
Fix was to use ggh4x::facet_wrap2()
to apply color and fill to
ggplot facet strip rectangles.
Enable subtitle
arguments for ggjammaplot()
consistent with
jammaplot()
.
Add support for SingleCellExperiment
/Seurat
objects.
Adopt similar mechanism used in jamses::heatmap_se()
Use rowRanges()
when necessary.
Update centerGeneData()
to handle Matrix
objects, not just matrix
for example, SingleCellExperiment data may provide dgCMatrix
inherits(x, "sparseMatrix")
is valid for Matrix sparse matrices,
and sparseMatrixStats
may provide equivalent methods to matrixStats
if installed.Check if matrix is sparse, check if sparseMatrixStats
is installed,
if so use sparseMatrixStats::rowMedians()
, otherwise apply(x, 1, median)
Consider subtitle
ability to use colData()
colnames.
colorSub
to define colors used by arguments
titleBoxColor
and subtitleBoxColor
.ggjammaplot()
outputConsider other data centering options:
Use case:
rowStatsFunc
argument?If the goal is a heatmap, jamses::heatmap_se()
could accomplish
it entirely with custom code:
rowStatsFunc
function that recognized colnames,
assigned them to groups, returned the appropriate group stat
(min, max, mean, median).heatmap_se()
. Or is it?Consider including attributes which describe the centering.
control_label
, centerby_label
Consider method which accepts SummarizedExperiment
data, and arguments:
assay_name
- if one value, one matrix
is returned, otherwise list
centerby_colnames
- to define centerGroups
controlSamples
- same as usual, to define controls for centeringjamses::heatmap_se()
and other tools.Consider RLE plot - which is a flattened form of MA-plot
Motivation: single cell data with thousands of columns, they should not become independent panels.
plotRLExpr()
function shows example before and after normalization.Weakness of this style is that it doesn't show non-linear MA-plot, however the box-whisker/median with range is effective at showing when one sample has much different variability than others.
Consider adding signed-significance plot, perhaps ssplot()
Goal is to provide two data.frame
objects that have identifiers (gene?)
that can be aligned to one another.
-log10(significance) * sign(fold)
is used for each axis.volcano_plot()
return data.frame
equivalent to the data plotted, make it easy
to find hits, and highlighted points.
Consider how to utilize colData()
sample annotations.
Go with "easiest thing that works" since in most cases, MA-plots are data-agnostic QC for technical quality assessment. However, centering within known sample annotations could be convenient.
subtitle
boxes, or augmented labeling.SummarizedExperiment
input offers more sample
annotations than are easily added via subtitle
.jamses::heatmap_se()
: centerby_colnames
,
colData_colnames
.colData_byCols
,
pass to: mixedSortDF(colData(se), byCols=colData_byCols)
sample_color_list
, optional color assignment for colData
columns.develop better method to organize MA-plot panels
for example columns organized by batch
centerGeneData()
Enable SummarizedExperiment
input, either as a separate function,
or as option for this function. It should require assay_name
,
and return a numeric
matrix.
Figure out a reasonable way to include experiment design factors in
jammaplot()
and ggjammaplot()
panel labels.
For example "BAH013760_101_NEG_MS1"
is not a helpful identifier,
it would be much more interesting to include "Male, Dex-Treated, Time 3"
.
Bonus points for including each factor on its own line, using categorical colors for each value to make it easier to scan by eye.
Fix missing strip text colors in ggjammaplot()
when multiple colors
are supplied for titleBoxColor
.
Somehow the ggplot2 element_grob()
dispatch is not calling
the custom functon element_grob.element_textbox_colorsub()
.
ggjammaplot()
when input data x
does not
contain colnames, or rownames. Either case produces an error.Some method to detect non-linear (non-horizontal) MA-plot signal
Example case is one sample whose values are nearly all zero,
the MA-plot looks like a diagonal line aiming down from left to right.
After log-ratio normalization (for example with jammanorm()
),
the signal is shift up so the mean signal is at y=0, however
the diagonal is still there.
mad_row_min
, and test the
slope.It is unclear how much to trust a linear fit, and assuming the linear fit is reasonably good, what to do with the slope afterward.
Another idea: Could jammanorm()
only normalize using rows
where the raw values are above the noise threshold mad_row_min
?
Currently the process uses the x-axis calculated value (row mean/median)
to apply that threshold. Logically it seems reasonable that
measurements below a "noise floor" are noise, and therefore
should not be considered during normalization.
jammaplot()
and ggjammaplot()
- controlSamples
improve the indication that a sample (panel) was used as a control during data centering. Currently displays an asterisk "*" in the top-left, however it is not visible in all plots, especially when there are outlier samples. Also, there is no legend indicating the meaning of the asterisk.
ComplexHeatmap::Legend()
to create a custom legend.
It would be used to display "* - samples excluded during centering"
and optional highlight points.Consider argument to display/hide the color legend, useful when the color legend might be too large to display comfortably. Instead provide method for users to create their own legend.
jammaplot()
and ggjammaplot()
- MAD factor label box
Consider displaying the MAD factor inside a label box, to improve legibility when the text overlaps the point density. This change could be optional, controlled with a new argument.
grid
functions for drawLabels()
to allow much more
control over label placement, for example avoiding overlaps
between subtitle
and MAD factor
labels by shifting one label by
the height of the other label.
This suggestion might affect jamba::drawLabels()
.volcano_plot()
method to highlight points (rows) and assign colors should
be consistent with jammaplot()
, with similar color legend drawn.
consider colorized smooth scatter in panels that match the statistical thresholds used.
Consider option to use scatter points instead of smooth scatter, with colorized points in each relevant region.
Adjust outer margin for base plots
display y-axis labels only on the first plot each row
volcano_plot()
allow SummarizedExperiment
input
ggplot2
output, which would make labeling genes
much more feasible.Empty control group updates
Date centering by centerGeneData()
should be mirrored in MA-plots
for the same situation where control samples are entirely NA,
causing remaining values to become NA
and therefore not be visible
on MA-plot panels.
centerGeneData()
situation occurs when centering versus controlSamples
results in
rows where all controlSamples
have NA
values, thereby causing
all centered values to become NA
.
goal is to offer alternatives where appropriate:
"na"
: leave values NA (current behavior, default)"row"
: center versus row non-NA values"floor"
: center versus numeric floor"min"
: center versus the minimum observed valuejammaplot_se()
- customized jammaplot()
for SummarizedExperiment input
analogous to jamses::heatmap_se()
so it should share argument style
normgroup
default uses column(s): `colData(SE)[[normgroup]]centerby_colname
default uses column(s): `colData(SE)[[centerGroups]]sample_color_list
optional input for colorizationggjammaplot()
issues:
bw_factor
was behaving exactly opposite as expected
COMPLETE: color gradient by ggplot2::stat_density_2d()
is inconsistent
between outlier and non-outlier point ranges.
jammaplot()
and ggjammaplot()
may need one argument to adjust
detail of density plots overall. Let it handle passing binpi
and bwpi
to jamba::plotSmoothScatter()
, or using ggplot2 geom arguments.
COMPLETE: Related to indicating controlSamples
below, some plot hook to add
annotations to each panel while plotting for R base graphics. This
feature is likely already possible with ggjammaplot()
with custom
ggplot2 layers.
COMPLETE: new argument plot_hook_function
allows full customization
after each panel is rendered, for jammaplot()
in base R graphics.
Note: ggjammaplot()
is fairly slow for this purpose, rendering
may overall be slower than base R graphics, though this difference
could be from lack of particular optimizations.
Visual differences in jammaplot()
and ggjammaplot()
assignment of
colors to point density. Unclear whether this difference is due to
cropping of points outside the visual range, as happens with base R
graphics plotSmoothScatter()
and applyRangeCeiling=TRUE
.
ggjammaplot()
does not honor the order of samples when applying
facet_wrap()
, it should convert the facet column to factor with
levels equal to the order of columns to be plotted.Consider subtitle
being able to use one or more colnames
in
colData(se)
when the input data is SummarizedExperiment
.
When multiple columns are provided, include them as multiple lines,
each colored using colorSub
?
Consider using shadowText()
to display the MAD values, otherwise
it is not visible in some panels.
COMPLETE: There should be some indication for samples that are controlSamples
during centering.
Perhaps an asterisk in the topleft corner inside each plot panel?
Allow sample_color_list
input as alternative to colorSub
?
Or auto-detect list
input and try to match values in the list
.
drawLabel()
to size the title box at least
the width of each plot panel.FIXED: volcano_plot()
throwing an error:
"Error in utils::modifyList(default_params, new_values):
is.list(x) is not TRUE
update_function_params(function_name = "volcano_plot", param_name = "color_set",
new_values = color_set) at jam-volcano-plot.R#374
jammaplot()
should have some ability to provide column labels,
in place of using colnames(x)
which may be a super-long text string.jammaplot()
consider using jamba::adjustAxisLabelMargins()
for
panel margins by default, making margins
optional for custom use.
This change would ensure each margin fully displays text labels,
closer to how ggplot2 works.Potential bugs:
jammaplot()
highlightPch point shapes are not honored in the legend.COMPLETE: jammaplot()
draws plot labels after highlightPoints
, which can obscure
the highlightPoints
. Ideally, draw the labels then the highlighted points.
This bug may not be evident with ggjammaplot()
.
Partially completed by moving title box labels outside the plot. It is still possible with subtitle box labels, but will leave as-is for now.
ggjammaplot()
does not display subtitle
box in the bottom-left.
Consider using ggtext
or ggplot2::geom_label()
.
jammaplot()
and ggjammaplot()
should somehow indicate which samples
were used as controlSamples. Perhaps asterisk "*" in the title?"NaN"
.Basic workflow:
centerGroups
groupingjammacalc()
to calculate MADfactors for all replicates,
focusing on the omitted sample.volcano_sestats()
that takes sestats
input from
jamses::se_contrast_stats()
.blankPlotPos
xlab
, ylab
using summary, difference labelselement_text()
instead of ggtext::element_textbox()
geom_text_repel()
to label highlighted pointsconsider selectable x- and y-ranges, to highlight a box and points within it
Needed a workaround to build pkgdown site, see: https://github.com/r-lib/pkgdown/issues/1157
jammaplotDispEsts()
wrapper for DESeq2::plotDispEsts()
, although
this function needs the newer colorized plotSmoothScatterG2()
.jammaplot()
argument ablineV
is not functioning properly.jammaplot()
highlight legend is not using highlightPch
.Guides for MA-plots.
Basic guide to MA-plots for gene expression data.
check data normality, apply appropriate transform, normalize data
common patterns and what they mean:
non-parametric (rank-based) MA-plot
When to do use centerGroups
, controlSamples
.
log2fold_to_fold()
and fold_to_log2fold()
- to interconvert:
normal space fold changes, which could be represented as positive and negative values (e.g. 2-fold and -2-fold) or as ratios (e.g. 2-fold and 0.5-fold); and
fold_sign()
function that takes either as input and
returns either 1
or -1
, and by default never returns 0
since
1-fold change cannot easily be multiplied by its fold_sign()
without
causing it to become zero. More thought to be had as to the workflow.plotMA()
is used for the DESeq2::DESeqResults
object,
highlighting points that meet alpha < 0.1
by default.jammaplot()
on the count data,
optionally transformed using their recommended approach,
or use useRank=TRUE
.Version 0.0.10.900 fixed a bug where rowGroupMeans() was used to
center values, but used default na.rm=FALSE
, which caused groups
with missing values not to display a centered value for other non-NA
samples. The new version should provide two options:
jamba::rowGroupMeans()
within centerGeneData()
which would allow optional outlier detection, which could substantially
improve the quality of MA-plot panels.jammagg()
or something similarly named, to produce a ggplot2 object.jammaplot.SE()
could be specific to SummarizedExperiment objects.
It would require the names(assays(SE)) to define the data matrix to use.
It could even recognize sample groups from colData(SE)
, or
from internal design matrix used for statistical testing.whichSamples
which will
create the MA-plot data, but only display the samples of interest.User should be able to take the returned data.frame list, and make a tall data.frame sufficient for ggplot2, or sufficient to answer the question "What is that outlier datapoint?"
Future idea:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.