This document shows typical Topconfects usage with limma, edgeR, or DESeq2.

The first step is to load a dataset. Here, we're looking at RNA-seq data that investigates the response of Arabodopsis thaliana to a bacterial pathogen. Besides the experimental and control conditions, there is also a batch effect. This dataset is also examined in section 4.2 of the edgeR user manual, and I've followed the initial filtering steps in the edgeR manual.

knitr::opts_chunk$set(fig.width=7, fig.height=6)
library(topconfects)

library(NBPSeq)
library(edgeR)
library(limma)

library(dplyr)
library(ggplot2)

data(arab)

# Retrieve symbol for each gene
info <- 
    AnnotationDbi::select(
        org.At.tair.db::org.At.tair.db, 
        rownames(arab), "SYMBOL") %>%
    group_by(TAIR) %>% 
    summarize(
        SYMBOL=paste(unique(na.omit(SYMBOL)),collapse="/"))
arab_info <- 
    info[match(rownames(arab),info$TAIR),] %>% 
    select(-TAIR) %>%
    as.data.frame
rownames(arab_info) <- rownames(arab)

# Extract experimental design from sample names
Treat <- factor(substring(colnames(arab),1,4)) %>% relevel(ref="mock")
Time <- factor(substring(colnames(arab),5,5))

y <- DGEList(arab, genes=as.data.frame(arab_info))

# Keep genes with at least 3 samples having an RPM of more than 2
keep <- rowSums(cpm(y)>2) >= 3
y <- y[keep,,keep.lib.sizes=FALSE]
y <- calcNormFactors(y)

limma analysis

Standard limma analysis steps

design <- model.matrix(~Time+Treat)
design[,]

fit <-
    voom(y, design) %>%
    lmFit(design)

Apply topconfects

Find largest fold changes that we can be confident of at FDR 0.05.

confects <- limma_confects(fit, coef="Treathrcc", fdr=0.05)

confects

Looking at the result

Here the usual logFC values estimated by limma are shown as dots, with lines to the confect value.

confects_plot(confects)

confects_plot_me overlays the confects (red/blue) on a Mean-Difference Plot (grey) (as might be produced by limma::plotMD). As we should expect, the very noisy differences with low mean expression are removed if we look at the confects.

confects_plot_me(confects)

Let's compare this to the ranking we obtain from topTable.

fit_eb <- eBayes(fit)
top <- topTable(fit_eb, coef="Treathrcc", n=Inf)

rank_rank_plot(confects$table$name, rownames(top), "limma_confects", "topTable")

You can see that the top 19 genes from topTable are all within the top 40 for topconfects ranking, but topconfects has also highly ranked some other genes. These have a large effect size, and sufficient if not overwhelming evidence of this.

An MD-plot highlighting the positions of the top 40 genes in both rankings also illustrates the differences between these two ways of ranking genes.

plotMD(fit, legend="bottomleft", status=paste0(
    ifelse(rownames(fit) %in% rownames(top)[1:40], "topTable ",""),
    ifelse(rownames(fit) %in% confects$table$name[1:40], "confects ","")))

edgeR analysis

An analysis in edgeR produces similar results. Note that only quasi-likelihood testing from edgeR is supported.

Standard edgeR analysis

y <- estimateDisp(y, design, robust=TRUE)
efit <- glmQLFit(y, design, robust=TRUE)

Apply topconfects

A step of 0.05 is used here merely so that the vignette will build quickly. edger_confects calls edgeR::glmTreat repeatedly, which is necessarily slow. In practice a smaller value such as 0.01 should be used.

econfects <- edger_confects(efit, coef="Treathrcc", fdr=0.05, step=0.05)

econfects

Looking at the result

confects_plot(econfects)
confects_plot_me(econfects)
etop <-
    glmQLFTest(efit, coef="Treathrcc") %>%
    topTags(n=Inf)

plotMD(efit, legend="bottomleft", status=paste0(
    ifelse(rownames(efit) %in% econfects$table$name[1:40], "confects ", ""),
    ifelse(rownames(efit) %in% rownames(etop)[1:40], "topTags ","")))

DESeq2 analysis

DESeq2 does its own filtering of lowly expressed genes, so we start from the original count matrix. The initial steps are as for a normal DESeq2 analysis.

library(DESeq2)

dds <- DESeqDataSetFromMatrix(
    countData = arab,
    colData = data.frame(Time, Treat),
    rowData = arab_info,
    design = ~Time+Treat)

dds <- DESeq(dds)

Apply topconfects

The contrast or coefficient to test is specified as in the DESeq2::results function. The step of 0.05 is merely so that this vignette will build quickly, in practice a smaller value such as 0.01 should be used. deseq2_confects calls results repeatedly, and in fairness results has not been optimized for this.

dconfects <- deseq2_confects(dds, name="Treat_hrcc_vs_mock", step=0.05)

DESeq2 offers shrunken estimates of LFC. This is another sensible way of ranking genes. Let's compare them to the confect values.

shrunk <- lfcShrink(dds, coef="Treat_hrcc_vs_mock", type="ashr")
dconfects$table$shrunk <- shrunk$log2FoldChange[dconfects$table$index]

dconfects

DESeq2 filters some genes, these are placed last in the table. If your intention is to obtain a ranking of all genes, you should disable this with deseq2_confects(..., cooksCutoff=Inf, independentFiltering=FALSE).

table(dconfects$table$filtered)
tail(dconfects$table)

Looking at the result

Shrunk LFC estimates are shown in red.

confects_plot(dconfects) + 
    geom_point(aes(x=shrunk, size=baseMean, color="lfcShrink"), alpha=0.75)

lfcShrink aims for a best estimate of the LFC, whereas confect is a conservative estimate. lfcShrink can produce non-zero values for genes which can't be said to significantly differ from zero -- it doesn't do double duty as an indication of significance -- whereas the confect value will be NA in this case. The plot below compares these two quantities. Only un-filtered genes are shown (see above).

filter(dconfects$table, !filtered) %>%
ggplot(aes(
        x=ifelse(is.na(confect),0,confect), y=shrunk, color=!is.na(confect))) +
    geom_point() + geom_abline() + coord_fixed() + theme_bw() +
    labs(color="Significantly\nnon-zero at\nFDR 0.05", 
        x="confect", y="lfcShrink using ashr")

Comparing results

rank_rank_plot(confects$table$name, econfects$table$name, 
    "limma confects", "edgeR confects")
rank_rank_plot(confects$table$name, dconfects$table$name, 
    "limma confects", "DESeq2 confects")
rank_rank_plot(econfects$table$name, dconfects$table$name, 
    "edgeR confects", "DESeq2 confects")

sessionInfo()


pfh/topconfects documentation built on Nov. 18, 2023, 9:49 p.m.