import: Import results from differential expression (DE) analysis
In EnrichmentBrowser: Seamless navigation through combined results of set-based and network-based enrichment analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

This function imports fully processed expression datasets and results of differential expression (DE) analysis from limma, edgeR, and DESeq2. The imported data is converted to a SummarizedExperiment, annotating experimental design and genewise differential expression, which then allows straightforward application of enrichment analysis methods.

1	import(obj, res, from = c("auto", "limma", "edgeR", "DESeq2"), anno = NA)

`obj`	Expression data object. Supported options include `EList` (voom/limma), `DGEList` (edgeR), and `DESeqDataSet` (DESeq2). See details.
`res`	Differential expression results. Expected to match the provided expression data object type, i.e. should be an object of class `data.frame` if `obj` is provided as an `EList` (voom/limma), `TopTags` if `obj` is provided as a `DGEList` (edgeR), and `DESeqResults` if `obj` is provided as a `DESeqDataSet` (DESeq2). See details.
`from`	Character. Differential expression method from which to import results from. Defaults to `"auto"`, which automatically determines the import type based on the expression data object provided. Can also be explicitly chosen as either `'limma'`, `'edgeR'` or `'DESeq2'`.
`anno`	Character. The organism under study in KEGG three letter code (e.g. ‘hsa’ for ‘Homo sapiens’). For microarray probe-level data: the ID of a recognized microarray platform. See references.

The expression data object (argument obj) is expected to be fully processed (including normalization and dispersion estimation) and to have the experimental design annotated. The experimental design is expected to describe *a comparison of two groups* with an optional blocking variable for paired samples / sample batches (i.e. design = ~ group or design = ~ batch + group.)

The differential expression result (argument res) is expected to have the same number of rows as the expression data object, and also that the order of the rows is the same / consistent, i.e. that there is a 1:1 correspondence between the rownames of obj and the rownames of res. Note that the expression dataset is automatically restricted to the genes for which DE results are available. However, for an appropriate estimation of the size of the universe for competitive gene set tests, it is recommended to provide DE results for all genes in the expression data object whenever possible (see examples).

An object of class SummarizedExperiment that stores

the expression matrix in the assay slot,
information about the samples, including the experimental design, in the colData slot, and
information about the genes, including measures of differential expression, in the rowData slot.

Mandatory annotations:

colData column storing binary group assignment (named "GROUP")
rowData column storing (log2) fold changes of differential expression between sample groups (named "FC")
rowData column storing adjusted (corrected for multiple testing) p-values of differential expression between sample groups (named "ADJ.PVAL").

Additional optional annotations:

colData column defining paired samples or sample blocks (named "BLOCK")
metadata slot named "annotation" giving the organism under investigation in KEGG three letter code (e.g. "hsa" for Homo sapiens)
metadata slot named "dataType" indicating the expression data type ("ma" for microarray, "rseq" for RNA-seq).

Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>

KEGG Organism code http://www.genome.jp/kegg/catalog/org_list.html

Recognized microarray platforms http://www.bioconductor.org/packages/release/BiocViews.html#___ChipName

readSE for reading expression data from file, normalize for normalization of expression data, voom for preprocessing of RNA-seq data, p.adjust for multiple testing correction, eBayes for DE analysis with limma, glmQLFit for DE analysis with edgeR, and DESeq for DE analysis with DESeq2.

  # Setup
  ## i) Generate count data
  nsamples <- 4
  ngenes <- 1000
  dispers <- 1 / rchisq(ngenes, df = 10)
  rdesign <- model.matrix(~factor(rep(c(1, 2), each = 2)))
    
  counts <- rnbinom(ngenes * nsamples, mu = 20, size = 1 / dispers)
  counts <- matrix(counts, nrow = ngenes, ncol = nsamples)
  
  ## ii) Generate intensity data
  sd <- 0.3 * sqrt(4 / rchisq(100, df = 4))
  intens <- matrix(rnorm(100 * 6, sd = sd), nrow = 100, ncol = 6)
  rownames(intens) <- paste0("Gene", 1:100)
  intens[1:2, 4:6] <- intens[1:2, 4:6] + 2
  mdesign <- cbind(Grp1 = 1, Grp2vs1 = rep(c(0,1), each = 3))

  # (1) import from edgeR (RNA-seq count data)
  # (1a) create the expression data object
  library(edgeR)
  d <- DGEList(counts)
  d <- calcNormFactors(d)
  d <- estimateDisp(d, rdesign)
  
  # (1b) obtain differential expression results 
  fit <- glmQLFit(d, rdesign)
  qlf <- glmQLFTest(fit)
  res <- topTags(qlf, n = nrow(d), sort.by = "none")

  # (1c) import
  se <- import(d, res)

  # (2) import from DESeq2 (RNA-seq count data)
  # (2a) create the expression data object
  library(DESeq2)
  condition <- factor(rep(c("A", "B"), each = 2))
  dds <- DESeqDataSetFromMatrix(counts, 
                                colData = DataFrame(condition = condition),
                                design = ~ condition)

  # (2b) obtain differential expression results 
  dds <- DESeq(dds)
  res <- results(dds)

  # (2c) import
  se <- import(dds, res)

  # (3) import from voom/limma (RNA-seq count data)
  # (3a) create the expression data object
  library(limma)
  keep <- filterByExpr(counts, rdesign)
  el <- voom(counts[keep,], rdesign)
  
  # (3b) obtain differential expression results 
  fit <- lmFit(el, rdesign)
  fit <- eBayes(fit, robust = TRUE) 
  res <- topTable(fit, coef = 2, number = nrow(counts), sort.by = "none")

  # (3c) import
  se <- import(el, res)
  
  # (4) import from limma-trend (RNA-seq count data)
  # (4a) create the expression data object
  logCPM <- edgeR::cpm(counts[keep,], log = TRUE, prior.count = 3)
  el <- new("EList", list(E = logCPM, design = rdesign))
  
  # (4b) obtain differential expression results 
  fit <- lmFit(el, rdesign)
  fit <- eBayes(fit, trend = TRUE) 
  res <- topTable(fit, coef = 2, number = nrow(el), sort.by = "none")

  # (4c) import
  se <- import(el, res)

  # (5) import from limma (microarray intensity data)
  # (5a) create the expression data object
  el <- new("EList", list(E = intens, design = mdesign))
  
  # (5b) obtain differential expression results 
  fit <- lmFit(el, mdesign)
  fit <- eBayes(fit, robust = TRUE) 
  res <- topTable(fit, coef = 2, number = nrow(el), sort.by = "none")

  # (5c) import
  se <- import(el, res)