heatmap_se()
- Consider recognizing table
input.
2-dimensional table could be converted to matrix
, then assigning
dimnames()
to column_title
and row_title
if not already defined.
3-dimensional table could be converted to 2-dimensional matrix
,
stacking each matrix slice with rbind()
, then using row_split
to represent the third dimension?
heatmap_se()
Consider convenient way to define controlSamples
colnames(se)
, then define control_label
control_label
is defined by default.control_label
can be defined also,
versus [group_name]
Consider some way to indicate control samples used for centering.
control_prefix="*"
"Sample B"
would become "* Sample B"
Consider new argument: column_label_colname
as with row_label_colname
pkgdown
package docs.Update README.Rmd
DONE. Include visuals for plot_sedesign()
, and sestats()
heatmap_se()
heatmap_column_group_labels()
shrinkDataFrame()
Consider examples using colorjam::col_div_xf()
within a customized
ComplexHeatmap::Legend()
.
save_sestats()
DONE. Consider option to populate the "hits" worksheet using logFC
values
instead of only using c(1, 0, -1)
.
Consider new helper functions
contrast_list_by_factor()
: subdivides a set of contrasts by which
factor is being compared
sestats_to_dfs()
: wrapper for save_sestats()
which returns
the list of data.frames
.
Consider adding conversion of Seurat
to SingleCellExperiment
Key addition: maintain all assay matrix objects using their original
names: layername_assayname
log2(1 + x)
if range exceeds a threshold (e.g. 50)Option to convert sparse Matrix
objects to vanilla matrix
.
Add tests
heatmap_se()
basic workflows
Seurat
to SingleCellExperiment
heatmap_column_group_labels()
for Seurat
and SingleCellExperiment
plot_sedesign()
, groups_to_sedesign()
Consider option to return/filter "useful information":
DONE. Adapt heatmap_se()
for SingleCellExperiment
objects.
DONE. By proxy, it could also work for Seurat
objects.
save_sestats()
when rowData_colnames
contains
columns already present, causing them to be added twice.
The error is caused with writeOpenxlsx()
by duplicate colnames.plot_sedesign()
- Improve the method for bumping arrows.
Currently all contrasts that share the same y-position or x-position are bumped relative to each other. The goal is to bump contrasts only when they overlap another contrast.
Interesting to consider whether to begin with longest or shortest contrasts (by Eucliden distance) in order to control the symmetry.
Consider method to combine two SEStats
objects
Typically expected to append contrasts across two objects with the same hit thresholds, assay_names. Not required, however.
heatmap_se()
: Consider option to add item count to row_title
.
New argument? tabulate_row_split=TRUE
row_title
to include the number in parentheses.row_type
)".Bonus points: Trim trailing "s" if there is only one row.
DONE. save_sestats()
DONE. Debug misalignment of sheet name with contrasts. Sigh.
DONE. Consider optional methods to shorten the sheet names
max_nchar_sheetname
number of characters.
For example to edit things like "male", "female"
to "m", "f"
to save space.DONE. matrix_normalize()
and se_normalize()
DONE. Consider option to define reference_samples
during normalization.
CORRECTION: Test existing option controlSamples
to confirm it works
as intended for this purpose. CONFIRMED.
controlSamples
to themselves,
then normalize all other samples relative to that subset.Consider function to take SEStats
and reverse/flip contrasts
using a preferred set of contrasts.
Use case is when receiving data with (groupA-groupB) but the preferred order is (groupB-groupA).
SEStats
and finds matching contrasts to reverse.-1
to flip the sign.All colnames which include the contrast also have the contrast flipped.
Consider new function to take list
of data.frame
with statistical
results and create SEStats
object.
It should parse each data.frame
and create "hit "
column if needed.
It should define hit_list
for each data.frame
Consider function to manipulate dimnames in SEStats
: contrast_names,
assay_names, cutoff_names.
Basic example is to convert contrast
to comp
or vice versa.
contrast_names(SEStats)
and
contrast_names(SEStats)<-
assay_names
as needed.Rename cutoff_names
?? Perhaps not rename.
Consider function to "validate" SEStats
object, suggested rules:
hit_array
contrast_names must match hit_list
and stats_dfs
.
hit_array
assay_names must match hit_list
and stats_dfs
.hit_array
cutoff_names must match hit_list
and stats_dfs
.
Consider function to c()
multiple SEStats
objects together.
It would combine hit_list
, throw error whenever the input
and new data both have: assay_name
, contrast_name
, and cutoff_name
.
Consider function to subset SEStats
by dimensions:
assay_name
, contrast_name
, cutoff_name
For example: SEStats[1, 4:8, 2]
Add testing for all variations of se_contrast_stats()
create test case with NA
values, to test handle_na
se_contrast_stats()
discrepancies when isamples
is provided
in different orders. All outputs should be identical regardless of input
order. Bug was handle_na_values()
and was fixed, unclear when it was
introduced.DONE. save_sestats()
include a summary table with the "hit"
column from each contrast
Design idea for plot_sedesign()
Allow and document how to create a multi-panel plot, with this plot as one panel in the output?
Option to plot one contrast, showing the arrow (or arrows) and displaying at the top the full contrast name, optionally the abbreviated "comp".
Implement testthis
unit tests for se_contrast_stats()
Highest priority: test each option of handle_na
use_voom
.block
.normgroup
.block
and use_voom
together, voom_block_twostep=TRUE
,
which should call correlation twice.block
and normgroup
notably when one normgroup
contains 2+
unique block
values, and another normgroup
has only 1 block
.Bonus points: Test using a subset of isamples
that no longer contains
a valid contrast. It should fail - though in future it could potentially
return the group mean values, then stop short of performing contrasts.
Extend SEDesign
for block
and normgroup
:
Pressing need to maintain normgroup
and block
with sedesign
.
Basic goal: when sedesign
is subset, normgroup
and block
are also
subset consistently.
New slot names:
"normgroup"
- character
vector to match samples()
, and
rownames()
of the design matrix.
Or could it be data.frame
to permit multiple column values?
Upon use each row would be concatenated to make one value per sample.
When sedesign
is subset, "normgroup"
is also subset consistently."block"
- data.frame
with one or more columns, indicating.
Its primary purpose is to maintain values per samples()
, so they
can be subset and maintained consistently.
It will be pushed to se_contrast_stats()
how to deal with the
actual values:
character
values are considered block
covariates for limma
.
For now, only character
will be permitted.numeric
values can be encoded as a scalar covariate, and
would be appended as a new column in design
- and therefore must
be added as a new row in contrasts
with empty 0
values.integer
, factor
values are encoded as an ordinal covariate,
converted to rank integer values.normgroup()
, normgroup()<-
- set/get functions for sedesign@normgroup
block()
, block()<-
- set/get functions for sedesign@block
groups_to_sedesign()
to handle normgroup
, block
When normgroup
is supplied, contrasts should be limited to those
within each normgroup
.
It may be accomplished by including normgroup
as factor columns,
but not included with factor_order
so that comparisons cannot
involve multiple normgroup
values.
Store normgroup
in the SEDesign
object, see above.
SEStats
S4 object for output from se_contrast_stats()
Proposed slots:
"hit_array"
"stats_dfs"
- each contrast in data.frame
format"stats_df"
- overall merged data.frame
"sedesign"
- (with "block"
, "normgroup"
) - for reproducibilityMethods:
contrast_names()
sestats_to_list()
(analogous to hit_array_to_list()
)sestats_to_im()
as above but returns signed incidence matrixsestats_to_sedesign()
extracts equivalent SEDesign
objecthit_array()
- access to the array of statistical hits by dimensions:
cutoff_name
contrast_name
assay_name
method_name
- Add this dimension to enable alternative methodshit_im()
- incidence matrix
for specific dimensions in hit_array
hit_list()
- list
of stat hit direction, named by entitysestats_to_df()
- data.frame
suitable for RMarkdown and kable()
assay_names()
, contrast_names()
, cutoff_names()
,
method_names()
New function idea: heatmap_to_xlsx()
or heatmap_to_df()
to convert heatmap_se()
output to data.frame
or save with
jamba::writeOpenxlsx()
.
Idea is to extract data from the heatmap as displayed, in order to save it to a file (e.g. Excel or tab-delimited) for later review.
left_annotation
and top_annotation
data.heatmap_se()
,
in other words any custom HeatmapAnnotation()
functions would likely
not be supported. Things like annotation bar charts, line plots, etc. would
not be easily supported for export.Main features:
rowData_colnames
, sestats
, sestats_alt
Top annotations: top_colnames
Optional: Extract heatmap color function, apply to each cell. (This step sounds interesting, but is likely to be a bad idea. Applying a color per cell in Excel would be very time-consuming and inefficient, and would produce a much larger file.)
column_labels
, row_labels
row_title
.Column split - unclear the best approach to use:
Option to supply the se
object, and append additional data
via rowData_colnames
.
Required: Ability to save HeatmapList
objects, already drawn using
ComplexHeatmap::draw()
.
ComplexHeatmap::draw(hm1 + hm2 + hm3)
.HeatmapList
, potentially also
obtaining left_annotation
and top_annotation
from each Heatmap
.
Ugh. Each heatmap could have different top_annotations
which means
the top_annotation
would need to have a column with row names for
each heatmap's unique top_annotation
.groups_to_sedesign()
Consider adding argument normgroup
to restrict contrasts within
each normgroup
.
heatmap_se()
: Consider some way to indicate controlSamples
.
Note that sometimes controlSamples
are not displayed on the heatmap,
this might be an exception where it is not shown.
The rule might be: if not all controlSamples
are in isamples
then do
not indicate controlSamples
.
"*"
se_normalize()
DONE. Consider allowing user-specified assay_name
to store the
resulting data? Some mechanism to allow customizing the output assay_name
.
New arguments:
output_method_prefix
which by default uses each value in method
to formulate each output assay_name
DONE. output_assay_names
assay_name
values,
overriding method
and output_method_prefix
when defining
the output assay_name
.character
vector, must be equal to the number of normalizations.
Note that each normalization is applied to each assay_names
in order:
method1_assay1
, method1_assay2
, method2_assay1
, method2_assay2
.DONE. Consider using mcols(assays(se))
to store annotation regarding
the normalization:
"normalization_method"
, "source_assay_name"
,
"params"
(params may not be easy to use, since params is a list
whose elements may include character
vectors, numeric
values with
many decimal places, etc.mcols()
entries to existing mcols()
entries to decide whether to overwrite the existing entry, or to
create a new assays()
entry with versioned name and different params.
Decision: Do not make this change, push this back onto users
to define output_method_prefix
or output_assay_names
, with the
idea that the user should specifically request creating a new
assay_name
.DENIED. If method="jammanorm"
is called with different params, should it
create a new assay_name
entry associated with those params?
Decision: No.
heatmap_column_group_labels()
Consider permitting a subset of se
columns being passed, to
allow labeling only a subset of columns per operation.
Ignore columns which are not provided in the se
object.
DONE. Consider option to hide the group line, when the group label is also empty or whitespace.
heatmap_row_group_labels()
New function completely analogous to heatmap_column_group_labels()
.
rot
text rotation is 180 degrees (sideways).hm_title_buffer
to adjust the
buffer to the left of left_annotation
, HeatmapBody
?se_normalize()
DONE. new methods from edgeR::calcNormFactors()
:
DENIED. Consider composite assay_name
with param values?
params
DENIED. consider renaming "limma_batch_effect"
something shorter, and without underscores
"lba"
, "limmabatchadjust"
(Instead user can customize the resulting assay_name.)
DELAYED. Consider something like list_normalization_methods()
purpose is to list available normalizations as a programmatic reference
data.frame
: abbreviation, full_name, descriptionabbreviation
ultimately becomes the new assay_name
jamba::jargs(se_normalize, "method")
will print available
options, except without descriptive information. ? se_normalize
heatmap_se()
When an entry in sample_color_list
contains a color function, the
breaks are taken from attr(x, "breaks")
, and it uses the same
values as labels.
attr(x, "labels")
. It must be the same length as "breaks"
.se_contrast_stats()
when using blocking factor (argument block
) the group mean values
slightly differ from what could be calculated manually, even when
applying limma batch adjustment.
heatmap_se()
DONE. Consider using "legend_title"
instead of "name"
as the heatmap
legend title, so that "name"
can be defined with a specific value.
The driving need is when adding two heatmaps together, if they both
have the same "name"
the name appears twice when calling
ComplexHeatmap::list_components()
. Instead, we should allow name
to be defined uniquely for each heatmap, even while both heatmaps
share the same legend title.
ht_opt(ROW_ANNO_PADDING=grid::unit(4, "mm"))
and ht_opt(COLUMN_ANNO_PADDING=grid::unit(4, "mm"))
.
It should set the value, then revert to previous value afterward.se_contrast_stats()
with handle_na="full1"
contrasts_to_venn_setlists()
DONE. Assign list names based upon the contents of each Venn setlist, instead of the default names that are not user-friendly.
Consider "two-way" Venn subsets that include: two-way contrast, corresponding one-way contrasts.
plot_sedesign()
When contrast_depths=2
and supplying sestats
it labels the one-way
and two-way contrasts, but should only label one-way contrasts.
The workaround is to alter sestats$hit_array
to include only the
contrasts being displayed.
heatmap_se()
consider some method to "group" the sestats
contrasts, for example
sub-grouping them by which factor(s) are being compared.
Use contrasts_to_factors()
to generate a table, then split by which
factors have comparisons (with delimiter "-"
).
contrast2comp()
and comp2contrast()
consider method that can convert contrasts inside stat colnames:
"logFC factorA_factorB-factorC-factorB"
to
"logFC factorA-factorD:factorB"
"hit mgm5 adjp0.05 fc1 PNU_Control_Nano147car1-Veh_Control_Nano147car1"
to "hit mgm5 adjp0.05 fc1 PNU-Veh:Control:Nano147car1"
migrate platjam::design2colors()
Consider adding SE-specific method
rowData()
and colData()
in one stepcontrasts()
, contrast_names()
,
groups()
, samples()
, etc.consider DESeq2 support
Research the equivalent design model, contrasts model for equivalent comparisons in DESeq2 as used with limma-voom. Often people seem to include only groups relevant to each contrast in the DESeq2 workflow, unclear if this is recommended guidance.
normgroup
to separate
subsets that are expected to differ substantially by sample type,
or where each subset may represent different variabilities.normgroup
which can define
independent subsets of sample groups.normgroup
,
for example "untreated", "treated1", "treated2" would imply three
one-way contrasts, and each contrast would be performed in its own
unique group-to-group analysis.Include options specific to DESeq2:
Decide how to handle stats column headers which differ from limma,
so they are easily used in downstream methods: "P.Value"
, "adj.P.Val"
,
"logFC"
, etc.
voom()
internals.)hit_array
,
so that limma-voom, DESeq2 could both be used and directly compared.
Similarly the posthoc_method="DEqMS"
may be included so that
its effects can be compared to limma without the custom posthoc adjustment.plot_sedesign()
Option to filter by max_depth
(to show only oneway contrasts)
or contrast_depths
(to show only oneway, or only twoway contrasts).
group_buffer
is adjusted, the replicate "n=3"
labels are
not shifted, so they can appear outside the square.Consider option to adjust the offset between contrasts.
filter_contrast_names()
Consider option to enable sequential comparisons, for example: Time1-Time0, Time2-Time1, Time3-Time2. This option may be more useful when enabled only for certain factors. Not as useful for something like "Treatment" where each treatment usually does not have a sequential order.
heatmap_se()
Add optional argument to apply heatmap title to column_title
,
so that it appears above the heatmap automatically.
It will override showing column_split
titles. So it may be best
practice to use heatmap_column_group_labels()
heatmap_column_group_labels()
.contrasts_to_venn_setlists()
resulting in some sets with4 entries.
sestats
with contrasts_to_venn_setlists()
so that
the output is actually a list
suitable for venndir::venndir()
.contrast_names()<-
validate the supplied value
and print error message when incompatible.
plot_sedesign()
consider easy filter to hide two-way contrasts
arrow_ex
,head_ex
arguments to make_block_arrow()
visible in plot_sedesign()
. Perhaps change twoway_lwd
to twoway_cex
,
then applying some global cex
to all contrast arrows and connectors.Done. Fix bug with custom contrasts, two-way contrast color NA
throws error.
validate_sedesign()
argument contrasts
is not properly subsetting contrasts by name,
however contrast_names()<-
appears to be working fine. Same mechanism.
Done. filter_contrast_names()
new function: take long list of "all versus all" contrasts and subset for specific control factor levels.
heatmap_se()
consider indicating controlSamples
when a subset of samples are used,
as opposed to relying upon control_label
. Especially useful when
a subset of potential samples are used, or when there are multiple
normgroups displayed, each with their own controlSamples
.
consider option to apply hm_title
to column_title
which may
help when adding two heatmaps together, they would have the
column_title defined within each heatmap itself.
Downside: no column titles defined by column_split
.
SEDesign
: consider associating blocking factor to one or more contrasts
Unclear how: blocking factor is per-sample annotation
se_contrast_stats()
does not indicate blocking factor - this is the
real need, so output represents the comparison.
SEDesign
: consider associating factor name with each contrast
Background: Each contrast compares one or more factors. It would be useful to "know" the factors being compared during automated analysis.
Proposed changes to SEDesign
"contrast_factors"
, list
named by colnames(contrasts)
values contain zero or more colnames(factors)
new slot: "factors"
, data.frame
colnames(factors)
are experimental factors (Treatment, Time)groups_to_sedesign()
rownames(factors) == rownames(design)
subsetting SEDesign
factors
contrasts
and contrast_factors
SEDesign
: consider supporting ~Treatment + Time
style design matrix
Current design and contrast matrices use ~ 0 + Treatment + Time
,
so that contrasts indicate distinct experiment groups.
~Treatment + Time
format changes:
design
includes "(Intercept)"
then factor levelsplot_sedesign()
needs to handle this format differently,
visualizing factor comparisons differently as well.
~Treatment + Time + Treatment:Time
Method to rbind()
or cbind()
SummarizedExperiment objects.
Main goal is to automate the process of aligning samples
When metadata does not match:
Create SEStats
object as more formal S4 object output se_contrast_stats
()
slotNames
stats_dfs
: each data.frame
from each contrast_name
, assay_name
hit_array
: N-dimensional array of stat hits with direction:
assay_name
contrast_name
cutoff_name
method_name
? (to compare limmavoom, limma, DESeq2, edgeR?) adding
this dimension could be fairly invisiblestats_df
: omit - separate function to merge data.frame
metadata
: method, parameters, etc.methods
hit_list()
: calls hit_array_to_list()
to_df()
: calls sestats_to_df()
to create table summary of countshits()
: converts hit_list()
into incidence matrix?contrast_names()
: extracts contrast_name
vectorassay_names()
: extracts assay_name
vectorreapply_cutoffs()
: (new) re-calculate hit_array
,
by iterating stats_dfs
and applying stat cutoffs.rbind_sestats()
: combines multiple SEStats
objectsOther related todo:
consider backward compatibility?
SEStats_to_list()
to convert SEStats
to previous list
formatlist_to_SEStats()
to convert previous format to new SEStats
heatmap_se()
should accept SEStats
input
hit_array_to_list()
should accept SEStats
inputsestats_to_df()
should accept SEStats
inputheatmap_se()
use new SEStats
object input
consider expanding sestats
to show separate cutoff_name
and
method_name
entries as distinct stripes in the incidence matrix,
rather than including hits across cutoff_name
entries together.
The labels could become quite long.
Expand se_contrast_stats()
to call corresponding DESeq2 methods.
se_contrast_stats()
For very large data volume, the method seems to take more memory than absolutely necessary and could be trimmed:
rownames()
for stats_dfs
and stats_df
.
Row identifiers are roughly 50% the size of each data.frame
,
so they should not be stored in a column and as rownames. The
rownames are less "safe" to R manipulation, so values will be
retained in a specific column (first column).
Other functions that utilize data.frame
objects must use column
values and not rely upon rownames, in theory should already be true.stats_df
, roughly 25% overall object size.
Matter of fact, this object could be replaced by a function that
converted stats_dfs
into stats_df
dynamically.stats_dfs
, roughly 75% overall object size,
but used to create volcano plots, and to review specific results.When using block
the process becomes substantially slower with large
data.
Description of the scenario, and supporting evidence:
limma::lmFit()
when supplied with block
and when correlation=NULL
, so it is calculated inside lmFit()
.limma::duplicateCorrelation()
which is run to determine correlation
.correlation
the lmFit()
using
220k rows took 1.5 seconds.Potential workarounds:
block
is defined, and correlation
is not supplied,
calculate correlation
using a subset of up to N rows (e.g. 10000).
The correlation could even be calculated 10 times using random
subsets of 1000 rows, then take the average.
The max rows could be a new argument max_correlation_rows=10000
to make this process explicit, and customizable.se_contrast_stats()
to avoid having an invisible difference
between using this function, and using limma::lmFit()
Need to store arguments such as correlation
, and block
alongside
the returned sestats
object.
Debug why loading jamses
causes the warning to the effect:
"design() has already been defined"
.
plot_sedesign()
Option to size block arrows by the relative number of hits?
Option to define block arrow sizes as a vector, one per arrow, which probably means one per contrast name.
heatmap_se()
Consider option to cluster rows by stats hit matrix data. Unclear how, but would be useful to sub-group hits.
se_normalize()
: use mcols(assays(se))
:
Driving use case:
metadata(assays(se)[[1]])
cannot be used, since metadata on
the matrix itself is easily lost upon any sort of manipulation,
subsetting of the matrix.upon creating a new entry in assays(se)
it should also populate
mcols(assays(se))
with columns of annotation. These are optional
annotations, but could be convenient for including things like
arguments to the methods used.
platjam
data import methods similarly.Suggested annotation names:
"method"
: name of the normalization method applied"params"
: list
of parameters for the given method,
obtained from the argument with params[[method]]
"parent_assay_name"
: may not be practical. The assay_name
can be edited, which would invalidate the relationship.preferred
: optional flag to indicate the preferred assay_name
for downstream analysis?se_normalize()
: change default output_sep="_"
to output_sep="."
?
benefit is that method names (containing underscores) are more easily distinguished when concatenated:
"jammanorm.limma_batch_adjust.totalIntensity"
versus"jammanorm_limma_batch_adjust_totalIntensity"
Remaining TODO for plot_sedesign()
:
Fix error when any of axis1,axis2,axis3,axis4 are assigned empty values.
Options for rendering two-way contrasts:
PARTIAL. Consider option to "loop" two-way contrasts around the one-way contrast, similar to what happens when two contrasts are too close to each other.
Option to adjust the midpoint of S-shaped "swoop", so the middle of the swoop would not be the diagonal midpoint between the end of contrast 1 and start of contrast 2. It could be shifted closer to contrast 1 or contrast 2. It could help when trying to avoid label overlaps.
Call sedesign_to_factors()
instead of calculating internally.
sedesign
object.units="snpc"
pattern to maintain fixed aspect ratio:
R
pushViewport(
viewport(
x=0.5, y=0.5,
width=unit(min(1, diff(xlim)/diff(ylim)), "snpc"),
height=unit(min(1, diff(ylim)/diff(xlim)), "snpc"),
xscale=xlim,
yscale=ylim))
groupedAxis()
grouped
regions along each axis, which may involve slightly nested boxes.
It may be useful to shade the boxes light grey.enhance sedesign
Consider adding factor_labels
to have design labels for each
factor in the group label.
DONE (initial implementation): new function plot_sedesign()
Simplified version of plotComparisonTable()
from past work.
vcd::mosaic()
which defines factors on x-axis, y-axis, then
subdivides each axis in order with sub-factors.Most contrasts should only be vertical or horizontal, therefore non-standard contrasts would be angled, indicating that they are comparing more than one factor at a time.
Completed TODO for plot_sedesign()
:
DONE. Option to indicate "n=8" number of replicates per group.
pos
which places the axis at a fixed coordinate
inside the plot.DONE. Add argument sestats
to print the hits per contrast.
assay_names
, cutoff_names
.
It passes arguments to hit_array_to_list()
.
Anything more complicated requires the user to pass argument
with custom labels.DONE. Sort contrasts:
DONE. Consider making hit direction optional when sestats
is used.
heatmap_se()
Consider automating color assignment when top_colnames
or
rowData_colnames
have no colors assigned in sample_color_list
.
It may involve calling platjam::design2colors()
with empty
argument for group_colnames
, which may involve moving that
function into colorjam
.
DONE: heatmap_se()
DONE: when correlation=TRUE
the label with number of rows should
indicate number of columns? See below, both dimensions are indicated.
Or number of rows used to calculate correlation of N columns?
column_type="samples"
and default heatmap
title also indicates the number of columns (samples).DONE: se_collapse_by_column()
DONE: consider changing default noise_floor_value=NA
to
noise_floor_value=0
.
useMedian=FALSE
so that the default
behavior is to take the mean and not the median value per group.jamba::call_fn_ellipse()
so that calls to row
group functions will only pass arguments accepted by that function.DONE: add heatmap_column_group_labels()
Custom function to augment heatmap_se()
for a specific scenario,
but the scenario is used often enough to warrant making the
function available here.
grid
coordinates after rendering to determine where to
position labels, which requires drawing the heatmap first.Heatmap
object,
which is not ideal, but is also a known limitation.se_contrast_stats()
DONE: add optional rowData()
colnames to stat data.frame
output,
for example adding rowData_colnames=c("SYMBOL", "GENENAME")
would
keep gene symbol and gene name alongside microarray probe IDs.
in future, optionally run DESeq2 equivalent steps to limma/limma-voom,
by replacing the run_limma_replicate()
step with optional function
to wrapper DESeq2 steps.
tests for se_contrast_stats()
with various handle_na
values,
and with/without rowData_colnames
.
testthat
unit testingsedesign
object
DONE: add method contrastNames()
(or contrast_names()
)
consider adding comps()
as shortcut for contrast2comp()
contrast2comp()
optimize performance, it is surprisingly slow, but functional
consider embedding the factor order into the output for two-way contrasts, to ensure the output exactly matches input even when alternate outputs are mathematically equivalent.
add plot_sedesign()
, plot_contrasts()
display design with similar layout orientation to vcd::mosaic()
optionally label contrasts by number of statistical hits
DONE: add contrast_names_to_sedesign()
convenience function to produce sedesign
from only contrastNames
plot_sedesign()
heatmap_se()
Add optional title above sestats
incidence matrix display,
which could address need to display which assay_name
, and cutoff_name
was used.
sestats
incidence matrix.heatmap_se()
improvements
DONE: Some method to hide column_title
labels. Use column_title=" "
.
hmgrouplabel
function heatmap_column_group_labels()
which enables cleaner column labels when the heatmap is split by one
or more variables in colData(se)
.Add concept of "normgroup"
to sestats_to_df()
Implied: "normgroup"
should also be added to sestats
.
Add per-gene logic and functions developed for the DM-JDM manuscript.
Bonus points: optionally print commands as they are being run, for example:
general idea is that se_contrasts()
is a wrapper around limma,
so it should be able to print the equivalent calls to limma::lmFit()
for example, so a user can observe the progression of analysis steps.
voom_jam()
; then the vanilla limma::voom()
run_limma_replicate()
command to run limma::lmFit()
, limma::eBayes()
COMPLETE: previous update to groups_to_sedesign()
introduced a regression (error)
for input that generates two-way contrasts.
heatmap_se()
when centerby_colnames=FALSE
no data centering is performed,
the color legend should match the range of data, and hide negative
values if there are no negative values.
col
color function in form of circlize::colorRamp2()
sample_color_list
into color_list
; or consider adding
row_color_list
so row colors can contain the same colnames
with different color assignments compared with sample_color_list
. Hmmm.se_contrast_stats()
COMPLETE: argument normgroup
to enforce independent statistical analyses
within each unique normgroup.
se_contrast_stats()
to include additional gene annotation data
similarly, the first column currently hardcoded "probes"
, should
use the appropriate column name, or allow it to be defined by argument.
implementation ideas: it could either use input rowData(se)
, in
the limma model fit, or annotation can be added while each stats_df
stat data.frame
is created.
groups_to_sedesign()
(COMPLETE) Mechanism to supply specific contrast names to be used in place of auto-generated contrasts.
(COMPLETE) Allow SummarizedExperiment
input, with group_colnames
used
to define sample groups.
sestats
object output from se_contrast_stats()
:
proper slot names: stats_df, stats_dfs, hit_array, hit_list, sedesign
sestats_to_df()
by default.functions:
hit_array(sestats)
with arguments assay_name, cutoff, contrastssestats_to_df()
bug:
apparently the colnames do not match the dimnames, "cutoff"
is displayed
for "assay_name"
values. Probably something was reversed during recursive
list
navigation.
heatmap_se()
does not have option to customize arguments
row_names_gp
nor column_names_gp
, which could be used to
colorize, highlight, boldface, individual labels in the heatmap.
Currently row_names_gp
is defined internally, in order to
define fontsize
based upon the number of rows and columns in
the heatmap. The fontsize
could be applied to user-defined
row_names_gp
or column_names_gp
, however sometimes the
rows and columns are defined dynamically - making it difficult
to sync a vector of grid::gpar(col=c("red", "black"))
values
to the exact rownames.
grid::gpar()
attributes: col
,
fontsize
, fill
, alpha
, fontfamily
, fontface
, cex
, font
(font
is an alias for fontface
, should be passed as fontface
).groups_to_sedesign()
implement normalization groups with these rules:
No one-way contrast, direct comparison, should be permitted which compares to different normalization groups.
se_contrast_stats()
. It could leverage the ssizeRNA
package
for RNA-seq data.contrast2comp()
is somehow very slow for even 10 to 20 contrast names.contrast2comp()
adjustments to tolerate having label prefix.
for example "fold (A_c-B_c)(A_d-B_d)"
could be recognized as
"label contrast"
and return "fold A-B:c-d"
.
"-"
character.heatmap_se()
font size customizations sufficient for manuscript prep.heatmap_se()
when not supplied rows
nor sestats
results in an error.heatmap_se()
should have option not to center data.heatmap_se()
should allow custom incidence matrix through
sestats
and alt_sestats
, to avoid having to provide sestats
or
hit_array
.heatmap_se()
- ability to "drill-down" into row clustersheatmap_se()
when sestats
is supplied but se
does not
contain all rows present in sestats
hit array.heatmap_se()
needs control over the annotation name fontsize.heatmap_se()
COMPLETE: argument isamples
should be useful to define samples
to display in the heatmap.
However, it would be useful to perform centering before
subsetting by samples, in order to produce more useful graphs with
paired data. For example, center data by each patient at time zero,
then display the other timepoints (since the patient time zero would
always be exactly zero, it only contributes a blank stripe to the
heatmap).
COMPLETE: top_annotation
and left_annotation
shows color key
for all colors, not just the those which are displayed in the heatmap
annotation.
+
or %v%
- in those cases
equivalent color keys are sometimes merged together. Unsure how
it would be handled when they are only partially identical.COMPLETE: create new function to choose interesting annotation colnames, with logic that removes columns with 1:1 cardinality compared to other chosen columns. Columns that only repeat the information are no longer interesting.
sestats_to_df()
consider making a wider output format intended
for kable
, with relevant columns grouped by assay_name
.
This way the output includes one row per contrast.sestats_to_df()
should report blank cell whenever NA values are
present, to convey that no cutoff was applied for that scenario,
rather than implying the cutoff was applied and there were no hits.se_detected_rows()
was added, however
in future would be nice to apply constraints based upon contrasts. (In hindsight, I'm not sure what use case I had in mind!)
COMPLETE: Migrate the volcano_plot()
from slicejam::volcano_plot()
currently also draws block arrows in the plot margins
jamma
package, alongside MA-plots.New class sestats
to replace output from se_contrast_stats
slotNames:
subset: [signal, contrast, cutoff]
sestats[matrix(ncol=3)]
it will subset for
each element in the hit_array
accessors:
summary()
, print()
prints the data.frame
summary of hit countshits()
will return list
of list
hit_array()
will return the full sestats@hit_array
hit_im()
will return an incidence matrix of hitsconverters
as.list()
to convert to previous list
formatlist2sestats()
to convert from previous list
format to sestats
COMPLETE: Some method to rename two-way contrasts to save character space.
Notes:
Contrasts should be easily distinguished, since se_contrast_stats()
encodes the contrast into column headers, for example
"logFC CellA_Treated-CellA_Control"
"adj.P.Value CellA_Treated-CellA_Control"
"P.value CellA_Treated-CellA_Control"
"hit mgm5 adjP0.01 fc1.5 CellA_Treated-CellA_Control"
":"
delimiter?Renamed contrasts ideally do not use parentheses, since the goal is to reduce characters.
(CellA_Treated-CellA_Control)-(CellB_Treated-CellB_Control)
CellA_Treated-CellA_Control:CellB_Treated-CellB_Control
Treated-Control:CellA-CellB
Two-way contrast:
(CellA_Treated-CellA_Control)-(CellB_Treated-CellB_Control)
(59 characters, 27 per contrast)Alternative two-way syntax:
Treated-Control x CellA-CellB
Treated-Control:CellA-CellB
Two-way with one unchanging factor:
(CellA_Treated_WT-CellA_Control_WT)-(CellB_Treated_WT-CellB_Control_WT)
(71 characters, 33 per contrast)Alternative two-way with extra factor:
CellA-CellB:Treated-Control:WT
(32 characters)Treated-Control CellA-CellB WT
One-way contrast with extra factor:
CellA_Treated_WT-CellA_Control_WT
(33 characters)Alternative one-way with extra factor:
Treated-Control:CellA:WT
(24 characters)Simpler methods for common visualizations
ComplexHeatmap::Heatmap()
Venn diagram of hits - using venndir::venndir()
design idea: use second order contrasts to determine which contrasts are "compatible" to be used in the same Venn diagrams.
it could compare two-way contrasts using third-order contrast logic
COMPLETE: volcano plots - migrate function from slicejam into jamma
package.
COMPLETE: save_sestats()
is super slow, for 6 worksheets,
~15 columns, 25k rows, took about 5 minutes. Should be much faster.
Likely imposes changes to jamba::writeOpenxlsx()
jamba::writeOpenxlsx()
to operate on
an open Workbook without saving, passing the Workbook to each
internal step also without saving. Each worksheet is added to
the Workbook, and it is only saved at the end. Saved about 20x time
especially for large multi-sheet Workbooks.COMPLETE: New function: save_sestats()
or something similar.
Saves statistical results to Excel .xlsx
file.
jamba::writeOpenxlsx()
and defines each column type.jamba::set_xlsx_colwidths()
to set proper column widths.TODO: Optionally saves the superwide data.frame
stat table output.
se_contrast_stats()
enhancements
Consider object type sestats
so it can have proper print.sestats()
and summary.sestats()
generic functions.
Some method to enforce normgroup
hit_array
groups_to_sedesign()
needs some method to define first-order contrasts
distinct from second-order contrasts.
The second-order contrasts appear to use the wrong factor_order
, e.g.
"A_Treat-A_Veh"
and "B_Treat-B_Veh"
"(B_Treat-A_Treat)-(B_Veh-A_Veh)"
"(B_Treat-B_Veh)-(A_Treat-A_Veh)"
se_contrast_stats()
, se_normalize()
, matrix_normalize()
need optional argument normgroup
COMPLETE: se_normalize()
and matrix_normalize()
normgroup
.basic workflow:
normgroup
subset, apply method, populate each result
into the full output matrix.normgroup
subset does not work
for batch adjustment (only one batch is represented) then copy data
as-is into the output matrix.Consider some mechanism to hold blocking factor within sedesign
object.
Bonus points: use hit_array to plot hit counts inside each block arrow
each block in the grid represents a group
probably use fixed aspect=1 to ensure blocks are square
if there is one factor, use it as x-axis, and use y-axis with no label
when two block arrows are co0linear, fan them out from the center
co-linear: identical slope (m) and intercept (b)
Goal would be to store output from se_contrast_stats()
in
a proper S4 object:
stats_df
SEDesign (samples, design, contrasts used)
Method to access stats data.frame by contrast
se_contrast_stats()
to call DESeq2se_contrast_stats()
to test isoforms/exonslimma::diffSplice
or DEXSeq
differential effects per geneAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.