knitr::opts_chunk$set( collapse=TRUE, warning=FALSE, message=FALSE, comment="#>", fig.path="man/figures/README-" ); options(knitr.table.format='markdown')
This package is under active development, these functions make up a core set of methods currently used across active Omics analysis projects. A summary of goals and relevant features are described below.
The core goal is to make data analysis and visualization of
SummarizedExperiment
objects straightforward for
common scenarios.
It also accepts SingleCellExperiment
and Seurat
objects.
~ 0 + group
syntax, see below.groups_to_sedesign()
.plot_sedesign()
.SEDesign-class
.se_contrast_stats()
.Convenient "short hand" for contrasts, see contrasts2comp()
.
(B_treated-B_control)-(A_treated-A_control) # "comp" format: B-A:treated-control
Integrate with other tools.
venndir::venndir()
- see Github "jmw86069/venndir"
to create
directional Venn diagrams.
The Limma User's Guide
(LUG) is an amazing resource which describes alternate approaches for
one-way and two-way contrasts, which are mathematically equivalent.
For a more thorough discussion please review these approaches to confirm
that ~0 + x
is mathematically identical to ~ x
, and only differs
in how estimates are reported.
jamses
uses the ~0 + x
strategy.(A_treated - A_control) # valid one-way contrast
It compares (treated-control)
with factor level A
unchanged.
However, an invalid one-way contrast is shown below:
(A_treated - B_control) # not a valid one-way contrast
It is invalid because allows two factor changes:
treated-control
and A-B
.
(A_treated - A_control) # one-way contrast
# "treated-control" for A
(B_treated - B_control) # compatible one-way contrast
# "treated-control" for B
(B_treated - B_knockout) # incompatible one-way contrast
* Caveat: Specific contrasts can be added manually when necessary.
groups_to_sedesign()
takes by default a data.frame
where each column
represents an experiment factor, and creates the following:
design
matrix
contrast
matrix using only appropriate changes in single
experiment factor levels for one-way contrasts, and
equivalent contrasts across secondary factor levels where appropriate
for two-way contrasts, up to max_depth
factor depth.samples
vector representing individual sample columns used
from the input data.
Output is SEDesign
as an S4 object with slot names:
sedesign@design
- the design matrix.
sedesign@contrasts
- the contrasts matrix.sedesign@samples
- vector of samples.SEDesign
is an S4 object that contains the following slots:
samples
: a character
vector of sample names, derived from
colnames()
of the SummarizedExperiment` object.
When samples
are subset, it has a cascade effect on design
and contrasts
. When groups are no longer represented, they are
removed from design
and contrasts
as relevant.
design
: a matrix
representing the design matrix, with additional
constraints:
rownames()
are always maintained, so they can be aligned with
colnames()
of the SummarizedExperiment
object.
This requirement helps ensure that the design and assay data
are always in proper sync.
colnames()
are defined as sample groups, equivalent to using
model.matrix( ~ 0 + groups )
for example from the
The Limma User's Guide
from the limma
Bioconductor package.All values are always 1
or 0
with no fractions. More complicated
designs require manual adjustment, beyond the scope of jamses
.
contrasts
: a matrix
representing the contrast matrix used for
statistical comparisons, with additional constraints:
rownames(contrasts)
are always defined, and always equal
colnames(contrasts)
to ensure contrasts and design are always
properly synchronized.
The example below uses a character
vector of group names per sample,
with two factors separated by underscore "_"
.
The same data can be provided as a data.frame
with two columns.
library(jamses) library(kableExtra) igroups <- jamba::nameVector(paste(rep(c("WT", "KO"), each=6), rep(c("Control", "Treated"), each=3), sep="_"), suffix="_rep"); igroups <- factor(igroups, levels=unique(igroups)); # jamba::kable_coloring(color_cells=FALSE, # format="markdown", # caption="Sample to group association", # data.frame(groups=igroups)) knitr::kable(data.frame(groups=igroups))
The resulting design and contrasts matrices are shown below:
sedesign <- groups_to_sedesign(igroups); jamba::kable_coloring( # knitr::kable( colorSub=c(`-1`="dodgerblue", `1`="firebrick"), caption="Design matrix output from design(sedesign).", data.frame(check.names=FALSE, design(sedesign))); # knitr::kable( jamba::kable_coloring( colorSub=c(`-1`="dodgerblue", `1`="firebrick"), caption="Contrast matrix output from contrasts(sedesign).", data.frame(check.names=FALSE, contrasts(sedesign)));
For convenience, SEDesign can be visualized using plot_sedesign()
:
# plot the design and contrasts plot_sedesign(sedesign); title(main="plot_sedesign(sedesign)\noutput:")
SummarizedExperiment
objects are normalized using:
se_normalize()
- lightweight wrapper to several normalization functions:
method="jammanorm"
: calls jamma::jammanorm()
which normalizes
the median log fold change to zero.
This method is also used in DESeq2::estimateSizeFactors()
for example.
When using jamma::jammaplot()
for MA-plots, the jammanorm()
method
is conceptually equivalent to shifting the y-axis values to zero, so
the median log fold change is zero. (Assumptions apply.)
method="quantile
: calls limma::normalizeQuantiles()
for
quantile normalization.method="limma_batch_adjust"
or method="lba"
: calls
limma::removeBatchEffect()
to adjust of batch effects.
This normalization is only recommended for visualization and clustering,
not recommended prior to statistical comparisons.
Instead, the recommendation is to define a blocking factor
passed to se_contrast_stats()
with argument block
.
matrix_normalize()
- is the core function for se_normalize()
and operates
on individual numeric data matrices.
Normalization can be reviewed with MA-plots, recommended by using
jamma::jammaplot()
. See Github repository"jmw86069/jamma"
.
se_contrast_stats()
is the central functionSEDesign
(design and contrasts) to SummarizedExperiment
data.Calls the appropriate limma
functions one or more assays
in the SummarizedExperiment
object.
use_voom=TRUE
will enable limmavoom
methodology for count data.
posthoc_test="DEqMS"
will enable limma-DEqMS
methodology
for proteomics mass spec data, which defines an error model based upon
PSM counts per row.block
will define blocking factor for limma
. When also applying
limmavoom
, it performs the appropriate two-step process as described
by Dr. Smyth.
Applies statistical thresholds to mark statistical hits:
adjp_cutoff
: adjusted P-value threshold
fold_cutoff
: normal space fold change thresholdmgm_cutoff
: "mgm" is an abbreviation for "max group mean".
This threshold requires at least one experiment group involved in the
contrast to have group mean at or above this threshold in order to
mark a statistical hit. It does not subset data prior to testing.
Additional threshold options:
p_cutoff
: unadjusted P-value threshold
ave_cutoff
: average expression threshold, using the equivalent of
limma
column AveExpr
with the mean group expression.
Note this AveExpr
value is calculated mean per row across all groups,
and may be subject to skewing.int_adjp_cutoff
, int_fold_cutoff
, int_mgm_cutoff
: optional
thresholds only used for interaction contrasts, intended for data mining
purposes. For example, data mining may filter for two-way contrast results
with no fold change threshold, or slightly relaxed int_adjp_cutoff=0.1
.
Options:
use_voom=TRUE
: enable limmavoom workflow for count data
posthoc_test="DEqMS"
: enable DEqMS post-hoc adjustment to the limma
empirical Bayes model fit.floor_min
, floor_value
: logic to handle numeric values below a
defined noise threshold, values below this threshold become floor_value
.
This filtering can also convert values from 0
zero to NA
,
where appropriate. For data with substantial missing measurements, this
option may be beneficial.handle_na
, na_value
: logic to handle NA
values, particularly
when an entire group is NA
.
For proteomics, or platforms with a defined "noise floor",
it may be useful to use handle_na="full1"
and na_value=noise_floor
.
This specific scenario assigns noise_floor
to one value in any completely
NA
group to noise_floor
so that its variability is not used,
however the fold change can still be calculated relative to the platform
minimum signalblock
: optional blocking factor, which enables the
limma::duplicateCorrelation()
calculation, which is applied to the
model fit per the
The Limma User's Guide.
Returns a list
referred to as sestats
:
"hit_array"
: array
of named numeric vectors indicating direction of
statistical hits after applying the relevant threshold cutoffs:
1
up-regulated-1
down-regulated"stats_dfs"
: list
of data.frame
for each contrast,
where column headers also include the statistical contrast to
help confirm the identity of each analysis result.
"stats_df"
: one super-wide data.frame
with results across
all contrasts, assay data matrices, and hit thresholds. This
matrix is intended to help filter for hits across the various
different contrasts and thresholds.
Save to excel with save_sestats()
:
This function helps automate saving every statistical contrast
in table format, into individual Excel worksheets.
It may include additional rowData()
annotations.
heatmap_se()
is a wrapper for the amazing ComplexHeatmap::Heatmap()
,
intended to automate frequently-used options that represent repetitive work.
Common rules:
log2
or otherwise "appropriately transformed".
(The typical transform is log2(1 + x)
.)correlation=TRUE
)
which re-uses the same data centering options (critical aspect of these plots).rows
or sestats
to show stat hit
annotations on the left side.colData(se)
and rowData(se)
colData(se)
and rowData(se)
colnames.Common arguments:
se
: the SummarizedExperiment
data for the heatmapassay_name
: define a specific data matrix to displayrows
: choose a specific subset of rows (optional)sestats
: optional stat hits in one of these formats:
output from se_contrast_stats()
matrix
with values c(-1, 0, 1)
list
of numeric values using c(-1, 0, 1)
, with names
representing rownames(se)
list
of rownames(se)
top_colnames
: column annotations in colData(se)
to display
across the top.
rowData_colnames
: optional row annotations to display on the left.sample_color_list
: list
named by colnames from colData(se)
or
rowData(se)
to define colors.centerby_colnames
: optional sub-groups for data centering
Useful to center multiple tissue types, or sample classes, batches, etc.
Set centerby_colnames=FALSE
to turn off data centering altogether.
controlSamples
: custom subset of control samplesfor data centering.
row_split
: split rows using colnames in rowData(se)
, or
by number of dendrogram sub-tree clusters.column_split
: split columns using colnames in colData(se)
, or
by number of dendrogram sub-tree clusters.mark_rows
: optional subset of rows for callout labels.sestats
to proper SEStats
S4 object, with convenient
accessor functions.DESeq2
or edgeR
methodology.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.