import: Import results from differential expression (DE) analysis

View source: R/import.R

importR Documentation

Import results from differential expression (DE) analysis

Description

This function imports fully processed expression datasets and results of differential expression (DE) analysis from limma, edgeR, and DESeq2. The imported data is converted to a SummarizedExperiment, annotating experimental design and genewise differential expression, which then allows straightforward application of enrichment analysis methods.

Usage

import(obj, res, from = c("auto", "limma", "edgeR", "DESeq2"), anno = NA)

Arguments

obj

Expression data object. Supported options include EList (voom/limma), DGEList (edgeR), and DESeqDataSet (DESeq2). See details.

res

Differential expression results. Expected to match the provided expression data object type, i.e. should be an object of class

  • data.frame if obj is provided as an EList (voom/limma),

  • TopTags if obj is provided as a DGEList (edgeR), and

  • DESeqResults if obj is provided as a DESeqDataSet (DESeq2). See details.

from

Character. Differential expression method from which to import results from. Defaults to "auto", which automatically determines the import type based on the expression data object provided. Can also be explicitly chosen as either 'limma', 'edgeR' or 'DESeq2'.

anno

Character. The organism under study in KEGG three letter code (e.g. ‘hsa’ for ‘Homo sapiens’). For microarray probe-level data: the ID of a recognized microarray platform. See references.

Details

The expression data object (argument obj) is expected to be fully processed (including normalization and dispersion estimation) and to have the experimental design annotated. The experimental design is expected to describe *a comparison of two groups* with an optional blocking variable for paired samples / sample batches (i.e. design = ~ group or design = ~ batch + group.)

The differential expression result (argument res) is expected to have the same number of rows as the expression data object, and also that the order of the rows is the same / consistent, i.e. that there is a 1:1 correspondence between the rownames of obj and the rownames of res. Note that the expression dataset is automatically restricted to the genes for which DE results are available. However, for an appropriate estimation of the size of the universe for competitive gene set tests, it is recommended to provide DE results for all genes in the expression data object whenever possible (see examples).

Value

An object of class SummarizedExperiment that stores

  • the expression matrix in the assay slot,

  • information about the samples, including the experimental design, in the colData slot, and

  • information about the genes, including measures of differential expression, in the rowData slot.

Mandatory annotations:

  • colData column storing binary group assignment (named "GROUP")

  • rowData column storing (log2) fold changes of differential expression between sample groups (named "FC")

  • rowData column storing adjusted (corrected for multiple testing) p-values of differential expression between sample groups (named "ADJ.PVAL").

Additional optional annotations:

  • colData column defining paired samples or sample blocks (named "BLOCK")

  • metadata slot named "annotation" giving the organism under investigation in KEGG three letter code (e.g. "hsa" for Homo sapiens)

  • metadata slot named "dataType" indicating the expression data type ("ma" for microarray, "rseq" for RNA-seq).

Author(s)

Ludwig Geistlinger

References

KEGG Organism code http://www.genome.jp/kegg/catalog/org_list.html

Recognized microarray platforms http://www.bioconductor.org/packages/release/BiocViews.html#___ChipName

See Also

readSE for reading expression data from file, normalize for normalization of expression data, voom for preprocessing of RNA-seq data, p.adjust for multiple testing correction, eBayes for DE analysis with limma, glmQLFit for DE analysis with edgeR, and DESeq for DE analysis with DESeq2.

Examples


  # Setup
  ## i) Generate count data
  nsamples <- 4
  ngenes <- 1000
  dispers <- 1 / rchisq(ngenes, df = 10)
  rdesign <- model.matrix(~factor(rep(c(1, 2), each = 2)))
    
  counts <- rnbinom(ngenes * nsamples, mu = 20, size = 1 / dispers)
  counts <- matrix(counts, nrow = ngenes, ncol = nsamples)
  
  ## ii) Generate intensity data
  sd <- 0.3 * sqrt(4 / rchisq(100, df = 4))
  intens <- matrix(rnorm(100 * 6, sd = sd), nrow = 100, ncol = 6)
  rownames(intens) <- paste0("Gene", 1:100)
  intens[1:2, 4:6] <- intens[1:2, 4:6] + 2
  mdesign <- cbind(Grp1 = 1, Grp2vs1 = rep(c(0,1), each = 3))

  # (1) import from edgeR (RNA-seq count data)
  # (1a) create the expression data object
  library(edgeR)
  d <- DGEList(counts)
  d <- calcNormFactors(d)
  d <- estimateDisp(d, rdesign)
  
  # (1b) obtain differential expression results 
  fit <- glmQLFit(d, rdesign)
  qlf <- glmQLFTest(fit)
  res <- topTags(qlf, n = nrow(d), sort.by = "none")

  # (1c) import
  se <- import(d, res)

  # (2) import from DESeq2 (RNA-seq count data)
  # (2a) create the expression data object
  library(DESeq2)
  condition <- factor(rep(c("A", "B"), each = 2))
  dds <- DESeqDataSetFromMatrix(counts, 
                                colData = DataFrame(condition = condition),
                                design = ~ condition)

  # (2b) obtain differential expression results 
  dds <- DESeq(dds)
  res <- results(dds)

  # (2c) import
  se <- import(dds, res)

  # (3) import from voom/limma (RNA-seq count data)
  # (3a) create the expression data object
  library(limma)
  keep <- filterByExpr(counts, rdesign)
  el <- voom(counts[keep,], rdesign)
  
  # (3b) obtain differential expression results 
  fit <- lmFit(el, rdesign)
  fit <- eBayes(fit, robust = TRUE) 
  res <- topTable(fit, coef = 2, number = nrow(counts), sort.by = "none")

  # (3c) import
  se <- import(el, res)
  
  # (4) import from limma-trend (RNA-seq count data)
  # (4a) create the expression data object
  logCPM <- edgeR::cpm(counts[keep,], log = TRUE, prior.count = 3)
  el <- new("EList", list(E = logCPM, design = rdesign))
  
  # (4b) obtain differential expression results 
  fit <- lmFit(el, rdesign)
  fit <- eBayes(fit, trend = TRUE) 
  res <- topTable(fit, coef = 2, number = nrow(el), sort.by = "none")

  # (4c) import
  se <- import(el, res)

  # (5) import from limma (microarray intensity data)
  # (5a) create the expression data object
  el <- new("EList", list(E = intens, design = mdesign))
  
  # (5b) obtain differential expression results 
  fit <- lmFit(el, mdesign)
  fit <- eBayes(fit, robust = TRUE) 
  res <- topTable(fit, coef = 2, number = nrow(el), sort.by = "none")

  # (5c) import
  se <- import(el, res)


lgeistlinger/EnrichmentBrowser documentation built on May 9, 2024, 7:22 p.m.