knitr::opts_chunk$set( collapse=TRUE, comment="#>" ) ragg_png <- function(..., res=192) { ragg::agg_png(..., res=res, units="in") } knitr::opts_chunk$set(dev="ragg_png", fig.width=10, fig.height=7, fig.ext="png") options("warn"=-1);
library(venndir)
Venndir was inspired by gene expression data, which contains
up/down directionality. Here are some examples showing how
gene expression data can be used in venndir
.
The venndir::make_venn_test()
function is useful to
generate test data.
setlist <- make_venn_test(2500, 3, sizes=c(400, 500, 200), do_signed=TRUE) venndir(setlist, overlap_type="agreement")
The limma
package provides an extensive set of gene expression
data analysis tools. The steps below reproduce the Limma Users Guide
example "Illumina Use Case"
in Section 17.3. This example
requires data available at http://bioinf.wehi.edu.au/marray/IlluminaCaseStudy
copied to the current working directory.
For the sake of this example, the steps assume the data is located
in a folder [HOME]/IlluminaCaseStudy
, where [HOME]
is the user
home directory.
illuminadir <- "~/IlluminaCaseStudy"; do_limma <- FALSE; if (suppressPackageStartupMessages(require(limma)) && dir.exists(illuminadir)) { do_limma <- TRUE; } if (do_limma) { targets <- readTargets(path=illuminadir) # import Illumina data x <- read.ilmn(files="probe profile.txt.gz", ctrlfiles="control probe profile.txt.gz", other.columns="Detection", path=illuminadir) # define expressed probes y <- neqc(x) expressed <- rowSums(y$other$Detection < 0.05) >= 3; y <- y[expressed,] # calculate within-patient correlations ct <- factor(targets$CellType) design <- model.matrix(~0+ct); colnames(design) <- levels(ct); dupcor <- duplicateCorrelation(y, design, block=targets$Donor); dupcor$consensus.correlation; # fit linear model, paired by patient, with correlations fit <- lmFit(y, design, block=targets$Donor, correlation=dupcor$consensus.correlation); # define contrasts contrasts <- makeContrasts(ML-MS, LP-MS, ML-LP, levels=design) # fit contrasts fit2 <- contrasts.fit(fit, contrasts); # eBayes fit2 <- eBayes(fit2, trend=TRUE) }
The steps are straightforward except that this example uses
a paired linear model. Nonetheless the results is the same,
the resulting object is class MArrayLM
.
The function limma::decideTests()
applied adjusted P-value
and log2 fold change filtering, and produces class TestResults
which is actually a signed incidence matrix. Take a look at
the output:
if (do_limma) { # decideTests() creates a signed incidence matrix fit2decide <- decideTests(fit2, method="global"); print(head(fit2decide)) }
A signed incidence matrix can be converted to setlist
with im_value2list()
. Then call venndir()
.
The "im_value"
refers to an incidence matrix with values (signs),
and "2list"
will convert to a list
.
The list
returned will contain named vectors, where
names are the item labels, and values are the signs.
if (do_limma) { limmalist <- im_value2list(fit2decide); venndir(limmalist, sets=c(1, 2)); }
A few features are noticeable:
limma
contrasts "ML - MS"
and "LP - MS"
are
shown in each Venn circle. The numbers in each circle
represent statistically significant changes given
the limma
thresholds.3,681
probes are shared, which is more than not shared
2,922
and 2,303
, respectively.Almost all the overlapping probes are changing in the same direction in these two contrasts
1,800
probes are up in both contrasts1,856
probes are down in both contrasts25
probes disagree in up-down or down-up directionImplied in #3, the shared probes are roughly evenly distributed between up and down.
The data can be drawn with proportional circles, otherwise known as a Euler diagram, or proportional Venn diagram.
For this diagram, we will plot sets=c(1, 3)
mostly
because these are more visually interesting.
if (do_limma) { limmalist <- im_value2list(fit2decide); venndir(limmalist, sets=c(1, 3), proportional=TRUE); }
An alternative approach to represent concordance is by "agreement", which combines up-up and down-down into one summary number.
Use the argument overlap_type="agreement"
.
if (do_limma) { limmalist <- im_value2list(fit2decide); venndir(limmalist, sets=c(1, 2), overlap_type="agreement"); }
Finally, you can represent all the directional changes
using overlap_type="each"
.
if (do_limma) { limmalist <- im_value2list(fit2decide); venndir(limmalist, sets=c(1, 2), overlap_type="each"); }
It is sometimes interesting to show item labels inside the Venn diagram -- but 3,600 text labels are too many to be displayed!
For the purpose of this example, we will filter statistical hits using a higher fold change, and more significant P-value threshold -- just to reduce the labels.
In practice, examples with >3000 hits should probably never be labeled unless printing full size on a poster (and even then, with small font!)
if (do_limma) { # increase stringency of statistical filtering fit2decide2 <- decideTests(fit2, p.value=1e-4, lfc=4); # convert to signed list limmalist2 <- im_value2list(fit2decide2); limmalist2 <- lapply(limmalist2, function(i){ jamba::nameVector(as.vector(i), names(i)) }) # display item labels venndir(limmalist2, sets=c(1, 2), poly_alpha=0.3, show_labels="Ni", show_items="sign item", item_cex=1, item_degrees=4, max_items=1000); }
The display of item labels brings up some potential benefits: it does indicate the relative density of labels, and relative quantity of up/down/mixed direction.
You can customize the item label so that it only displays the directional sign, and not the item. This technique may be good for higher number of items.
Use argument show_items="sign"
.
if (do_limma) { vo6s <- venndir(limmalist2, sets=c(1, 2), poly_alpha=0.3, show_labels="Ni", show_items="sign", item_cex=1, max_items=1000); }
The first item label figure highlights an important issue:
limma
and similar analysis tools test the probes,
not the genes.Therefore the results represent
Illumina probes, which are not very helpful by
themselves.
Fortunately in this case, the gene data is available in y$genes
,
so we can convert each row to a gene symbol. However there are multiple
probes per gene symbol. For the sake of this example, we will choose
the first of each gene symbol from the statistical hits.
There is another Jam R package function that may be helpful:
genejam::freshenGenes()
. This function takes one of more columns of gene symbols, gene accessions, gene identifiers, and returns the "best matching result" using Bioconductor gene annotation data.
if (do_limma) { # convert probe hits to gene hits limmalist2 <- im_value2list(fit2decide2); # iterate the hit list to convert to gene limmalist2g <- lapply(limmalist2, function(i){ kdf <- data.frame(check.names=FALSE, i); # use gene symbol kdf$genes <- y$genes[rownames(kdf), "SYMBOL"]; # subset for unique genes with non-empty value kdf_sub <- subset(kdf, nchar(genes) > 0 & !duplicated(genes)) # make a vector of signed direction kdf_genes <- kdf_sub[, 1]; # name the vector using gene symbol names(kdf_genes) <- kdf_sub$genes; kdf_genes }) venndir(limmalist2g, sets=c(1, 2), show_labels="Ni", show_items="sign item", poly_alpha=0.3, item_cex=1, max_items=1000); }
The examples above purposefully used only two contrasts, because those two contrasts show very high concordance. An experiment design with three groups is quite common, and adding directionality for the third contrast can be confusing, and still useful. (See section below on interpreting three-group Venn diagrams.)
if (do_limma) { venndir(limmalist, sets=c(1, 2, 3), overlap_type="agreement"); }
When converting probes to genes, I usually run a quick test to see if any genes have statistically significant probes that are "up" and "down" -- I call these "bi-directional genes". If there are no bi-directional genes, then choosing one entry per gene is reasonable. I leave this evaluation to the scientist, but please post an issue if you have specific questions.
For RNA-seq data, the input data matrix may already contain
gene expression values -- as when using tximport::tximport()
with the argument tx2gene
; or when importing featureCounts
data where each row represents one gene identifier. In those
cases, no conversion is required.
One common type of analysis involves three experimental groups. For example:
The typical contrasts are:
Nothing particularly unusual about the experiment design, nor the contrasts, these are fairly standard and straightforward.
When creating a three-way Venn diagram, the results are also fairly straightforward. When the directionality is included, it can be more confusing, but also potentially much more informative.
There are a few potential scenarios:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.