This package is under active development, these functions make up a core set of methods currently used across active Omics analysis projects. A summary of goals and relevant features are described below.
The core goal is to make data analysis and visualization of
SummarizedExperiment
objects straightforward for common scenarios. It
also accepts SingleCellExperiment
and Seurat
objects.
Define a design matrix using ~ 0 + group
syntax, see below.
Define statistical contrasts to compare one or more factors, see
groups_to_sedesign()
.
Plot contrasts as defined, see plot_sedesign()
.
Confirm design,contrast matrices are always in sync, see
SEDesign-class
.
Automate analysis of multiple contrasts, using
limma/limmavoom/DEqMS, see se_contrast_stats()
.
Convenient “short hand” for contrasts, see contrasts2comp()
.
(B_treated-B_control)-(A_treated-A_control) # "comp" format: B-A:treated-control
Integrate with other tools.
venndir::venndir()
- see Github "jmw86069/venndir"
to create
directional Venn diagrams.
The Limma User’s
Guide
(LUG) is an amazing resource which describes alternate approaches for
one-way and two-way contrasts, which are mathematically equivalent.
For a more thorough discussion please review these approaches to confirm
that ~0 + x
is mathematically identical to ~ x
, and only differs in
how estimates are reported.
The approach used in jamses
uses the ~0 + x
strategy. Each
experiment group is defined using independent replicates. This
approach does not imply that there is “no intercept” during the
model fit, see LUG for details.
One-way contrasts must compare only one factor change per contrast. For example, a valid one-way contrast is shown below:
(A_treated - A_control) # valid one-way contrast
It compares (treated-control)
with factor level A
unchanged. However, an invalid one-way contrast is shown below:
(A_treated - B_control) # not a valid one-way contrast
It is invalid because allows two factor changes: treated-control
and A-B
.
A two-way contrast is defined as the fold change of two compatible one-way fold changes, and in the proper order. Consider the following:
(A_treated - A_control) # one-way contrast # "treated-control" for A (B_treated - B_control) # compatible one-way contrast # "treated-control" for B (B_treated - B_knockout) # incompatible one-way contrast
Caveat: Specific contrasts can be added manually when necessary.
groups_to_sedesign()
takes by default a data.frame
where each
column represents an experiment factor, and creates the following:
design
matrix
contrast
matrix using only appropriate changes in single
experiment factor levels for one-way contrasts, and equivalent
contrasts across secondary factor levels where appropriate for
two-way contrasts, up to max_depth
factor depth.samples
vector representing individual sample columns used from
the input data.
Output is SEDesign
as an S4 object with slot names:
sedesign@design
- the design matrix.
sedesign@contrasts
- the contrasts matrix.sedesign@samples
- vector of samples.SEDesign
is an S4 object that contains the following slots:
samples
: a character
vector of sample names, derived from
colnames()
of the SummarizedExperiment` object.
When samples
are subset, it has a cascade effect on design
and
contrasts
. When groups are no longer represented, they are removed
from design
and contrasts
as relevant.
design
: a matrix
representing the design matrix, with
additional constraints:
rownames()
are always maintained, so they can be aligned with
colnames()
of the SummarizedExperiment
object. This requirement
helps ensure that the design and assay data are always in proper
sync.
colnames()
are defined as sample groups, equivalent to using
model.matrix( ~ 0 + groups )
for example from the The Limma
User’s
Guide
from the limma
Bioconductor package.All values are always 1
or 0
with no fractions. More complicated
designs require manual adjustment, beyond the scope of jamses
.
contrasts
: a matrix
representing the contrast matrix used for
statistical comparisons, with additional constraints:
rownames(contrasts)
are always defined, and always equal
colnames(contrasts)
to ensure contrasts and design are always
properly synchronized.
The example below uses a character
vector of group names per sample,
with two factors separated by underscore "_"
. The same data can be
provided as a data.frame
with two columns.
library(jamses)
library(kableExtra)
igroups <- jamba::nameVector(paste(rep(c("WT", "KO"), each=6),
rep(c("Control", "Treated"), each=3),
sep="_"),
suffix="_rep");
igroups <- factor(igroups, levels=unique(igroups));
# jamba::kable_coloring(color_cells=FALSE,
# format="markdown",
# caption="Sample to group association",
# data.frame(groups=igroups))
knitr::kable(data.frame(groups=igroups))
| | groups | |:----------------|:-----------| | WT_Control_rep1 | WT_Control | | WT_Control_rep2 | WT_Control | | WT_Control_rep3 | WT_Control | | WT_Treated_rep1 | WT_Treated | | WT_Treated_rep2 | WT_Treated | | WT_Treated_rep3 | WT_Treated | | KO_Control_rep1 | KO_Control | | KO_Control_rep2 | KO_Control | | KO_Control_rep3 | KO_Control | | KO_Treated_rep1 | KO_Treated | | KO_Treated_rep2 | KO_Treated | | KO_Treated_rep3 | KO_Treated |
The resulting design and contrasts matrices are shown below:
sedesign <- groups_to_sedesign(igroups);
jamba::kable_coloring(
# knitr::kable(
colorSub=c(`-1`="dodgerblue", `1`="firebrick"),
caption="Design matrix output from design(sedesign).",
data.frame(check.names=FALSE, design(sedesign)));
Design matrix output from design(sedesign).
WT_Control
WT_Treated
KO_Control
KO_Treated
WT_Control_rep1
1
0
0
0
WT_Control_rep2
1
0
0
0
WT_Control_rep3
1
0
0
0
WT_Treated_rep1
0
1
0
0
WT_Treated_rep2
0
1
0
0
WT_Treated_rep3
0
1
0
0
KO_Control_rep1
0
0
1
0
KO_Control_rep2
0
0
1
0
KO_Control_rep3
0
0
1
0
KO_Treated_rep1
0
0
0
1
KO_Treated_rep2
0
0
0
1
KO_Treated_rep3
0
0
0
1
# knitr::kable(
jamba::kable_coloring(
colorSub=c(`-1`="dodgerblue", `1`="firebrick"),
caption="Contrast matrix output from contrasts(sedesign).",
data.frame(check.names=FALSE, contrasts(sedesign)));
Contrast matrix output from contrasts(sedesign).
KO_Control-WT_Control
KO_Treated-WT_Treated
WT_Treated-WT_Control
KO_Treated-KO_Control
(KO_Treated-WT_Treated)-(KO_Control-WT_Control)
WT_Control
-1
0
-1
0
1
WT_Treated
0
-1
1
0
-1
KO_Control
1
0
0
-1
-1
KO_Treated
0
1
0
1
1
For convenience, SEDesign can be visualized using plot_sedesign()
:
# plot the design and contrasts
plot_sedesign(sedesign);
title(main="plot_sedesign(sedesign)\noutput:")
SummarizedExperiment
objects are normalized using:
se_normalize()
- lightweight wrapper to several normalization
functions:
method="jammanorm"
: calls jamma::jammanorm()
which normalizes
the median log fold change to zero. This method is also used in
DESeq2::estimateSizeFactors()
for example. When using
jamma::jammaplot()
for MA-plots, the jammanorm()
method is
conceptually equivalent to shifting the y-axis values to zero, so
the median log fold change is zero. (Assumptions apply.)
method="quantile
: calls limma::normalizeQuantiles()
for quantile
normalization.method="limma_batch_adjust"
or method="lba"
: calls
limma::removeBatchEffect()
to adjust of batch effects. This
normalization is only recommended for visualization and clustering,
not recommended prior to statistical comparisons. Instead, the
recommendation is to define a blocking factor passed to
se_contrast_stats()
with argument block
.
matrix_normalize()
- is the core function for se_normalize()
and
operates on individual numeric data matrices.
Normalization can be reviewed with MA-plots, recommended by using
jamma::jammaplot()
. See Github repository"jmw86069/jamma"
.
se_contrast_stats()
is the central functionApplies SEDesign
(design and contrasts) to SummarizedExperiment
data.
Calls the appropriate limma
functions one or more assays
in the
SummarizedExperiment
object.
use_voom=TRUE
will enable limmavoom
methodology for count data.
posthoc_test="DEqMS"
will enable limma-DEqMS
methodology for
proteomics mass spec data, which defines an error model based upon
PSM counts per row.block
will define blocking factor for limma
. When also applying
limmavoom
, it performs the appropriate two-step process as
described by Dr. Smyth.
Applies statistical thresholds to mark statistical hits:
adjp_cutoff
: adjusted P-value threshold
fold_cutoff
: normal space fold change thresholdmgm_cutoff
: “mgm” is an abbreviation for “max group mean”. This
threshold requires at least one experiment group involved in the
contrast to have group mean at or above this threshold in order to
mark a statistical hit. It does not subset data prior to testing.
Additional threshold options:
p_cutoff
: unadjusted P-value threshold
ave_cutoff
: average expression threshold, using the equivalent of
limma
column AveExpr
with the mean group expression. Note this
AveExpr
value is calculated mean per row across all groups, and
may be subject to skewing.int_adjp_cutoff
, int_fold_cutoff
, int_mgm_cutoff
: optional
thresholds only used for interaction contrasts, intended for data
mining purposes. For example, data mining may filter for two-way
contrast results with no fold change threshold, or slightly relaxed
int_adjp_cutoff=0.1
.
Options:
use_voom=TRUE
: enable limmavoom workflow for count data
posthoc_test="DEqMS"
: enable DEqMS post-hoc adjustment to the
limma empirical Bayes model fit.floor_min
, floor_value
: logic to handle numeric values below a
defined noise threshold, values below this threshold become
floor_value
. This filtering can also convert values from 0
zero
to NA
, where appropriate. For data with substantial missing
measurements, this option may be beneficial.handle_na
, na_value
: logic to handle NA
values, particularly
when an entire group is NA
. For proteomics, or platforms with a
defined “noise floor”, it may be useful to use handle_na="full1"
and na_value=noise_floor
. This specific scenario assigns
noise_floor
to one value in any completely NA
group to
noise_floor
so that its variability is not used, however the fold
change can still be calculated relative to the platform minimum
signalblock
: optional blocking factor, which enables the
limma::duplicateCorrelation()
calculation, which is applied to the
model fit per the The Limma User’s
Guide.
Returns a list
referred to as sestats
:
"hit_array"
: array
of named numeric vectors indicating direction
of statistical hits after applying the relevant threshold cutoffs:
1
up-regulated-1
down-regulated"stats_dfs"
: list
of data.frame
for each contrast, where
column headers also include the statistical contrast to help confirm
the identity of each analysis result.
"stats_df"
: one super-wide data.frame
with results across all
contrasts, assay data matrices, and hit thresholds. This matrix is
intended to help filter for hits across the various different
contrasts and thresholds.
Save to excel with save_sestats()
:
This function helps automate saving every statistical contrast in
table format, into individual Excel worksheets. It may include
additional rowData()
annotations.
heatmap_se()
is a wrapper for the amazing ComplexHeatmap::Heatmap()
,
intended to automate frequently-used options that represent repetitive
work.
Common rules:
log2
or otherwise “appropriately
transformed”. (The typical transform is log2(1 + x)
.)correlation=TRUE
) which re-uses the same data centering options
(critical aspect of these plots).rows
or sestats
to show stat hit
annotations on the left side.colData(se)
and
rowData(se)
colData(se)
and rowData(se)
colnames.Common arguments:
se
: the SummarizedExperiment
data for the heatmap
assay_name
: define a specific data matrix to display
rows
: choose a specific subset of rows (optional)
sestats
: optional stat hits in one of these formats:
output from se_contrast_stats()
matrix
with values c(-1, 0, 1)
list
of numeric values using c(-1, 0, 1)
, with names
representing rownames(se)
list
of rownames(se)
top_colnames
: column annotations in colData(se)
to display across
the top.
rowData_colnames
: optional row annotations to display on the left.
sample_color_list
: list
named by colnames from colData(se)
or
rowData(se)
to define colors.
centerby_colnames
: optional sub-groups for data centering
Useful to center multiple tissue types, or sample classes, batches, etc.
Set centerby_colnames=FALSE
to turn off data centering altogether.
controlSamples
: custom subset of control samplesfor data centering.
row_split
: split rows using colnames in rowData(se)
, or by number
of dendrogram sub-tree clusters.
column_split
: split columns using colnames in colData(se)
, or by
number of dendrogram sub-tree clusters.
mark_rows
: optional subset of rows for callout labels.
sestats
to proper SEStats
S4 object, with convenient
accessor functions.DESeq2
or edgeR
methodology.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.