knitr::opts_chunk$set(
  # code or die
  echo = TRUE,
  # minimize verbosity
  warning = FALSE, message = FALSE,
  # dpi = 150, # for hires images
  comment = "#>")
set.seed(0xFEED)
options(digits = 3)

Overview

This demonstrates how to use the facile.bio ecosystem to perform the analyses of RNA-seq data as outlined in the edgeR QLF From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline workflow.

library(edgeR)
library(dplyr)
library(FacileBiocData)
library(FacileAnalysis)
library(ggplot2)
theme_set(theme_bw())

Assemble DGEList

If you're here, we already assume you know what a DGEList is, so we've skipped the whole "exploration of the DGEList structure" part of the workflow and have encapsulated the DGEList building code to an internal function that we use here:

source(system.file("scripts", "edgeRQLF-workflow.R", package="FacileAnalysis"))
y <- assemble_edgeRQLF_DGEList()

From here on out, we will perform these analyses using only the maneuvers made available in this package and the greater facile.bio ecosystem. We have an extensive amount of unit tests in this package that ensure that the results generated by the FacileAnalysis package corresponds to what is produced using standard Bioconductor maneuvers, which you can refer to.

You can also refer to the sister FacileRnaSeqGeneEdgeRQL-test.Rmd vignette to see how the exact statistics generated here match the ones produced using the standard edgeR/QLF framework.

In order to directly use this DGEList within the facile framework, we just have to invoke the FacileBiocData::facilitate function:

yf <- facilitate(y)

Dimensionality Reduction

For now, we only support PCA:

pca.y <- fpca(yf, ntop = 500)
colors <- c(lactating = "darkgreen", pregnant = "red", virgin = "blue")
viz(pca.y, color_aes = "Status", shape_aes = "CellType", color_map = colors,
    width = 600)

Differential Expression Analysis

Test differential among lactating b cells vs pregnant b cells.

Feture filtering is performed by default:

dge.b <- yf |> 
  flm_def("group", numer = "B.lactating", denom = "B.pregnant") |> 
  fdge(method = "edgeR-qlf")
tidy(dge.b) |> head(10)

The MA plots in facile are interactive, so we don't want to overload the web-browser. Here we'll chose to show a sample of both significant and "background" genes.

bg.genes <- tidy(dge.b) |> filter(padj > 0.1) |> sample_n(200)
fg.genes <- tidy(dge.b) |> filter(padj < 0.1) |> sample_n(1000)
viz(dge.b, type = "MA", features = bg.genes, highlight = fg.genes,
    width = 600)
viz(dge.b, type = "volcano", features = bg.genes, 
    highlight = head(fg.genes, 100), width = 600)

Differential expression above a fold-change threshold

We'll re-use the model from the original analsis and tweak the fdge parameters. Instead of automatically filtering, we just specify to use the same features from the previous analysis.

dge.bthres <- dge.b |> 
  model() |> 
  fdge(method = "edgeR-qlf", treat_lfc = log2(1.5), features = features(dge.b))
tidy(dge.bthres) |> head(10)

:::note The workflow goes into a heatmap section, but we will save that for later. There are more interesting differential expression examples in the workflow that we will continue here. :::

Analysis of Deviance

Complicated Contrasts

Suppose we want to compare the three groups within in the luminal population, i.e., virgin, pregnant and lactating:

anova.luminal <- yf |> 
  filter_samples(CellType == "L") |> 
  flm_def("Status") |> 
  fdge(method = "edgeR-qlf")
tidy(anova.luminal) |> head(10)

Complicated Contrasts

The contrast defined in the workflow is basically testing an interaction:

con <- makeContrasts(
  (L.lactating-L.pregnant)-(B.lactating-B.pregnant), 
  levels=design)

The facile way is to perform this analysis is to run the two individual differentail expression analyses independently, then compare the two.

For instance, we already have the "lactating" vs "pregnant" comparison within the B cell population stored in dge.b. We now have to run the same comparison within the lumina cells:

dge.L <- yf |> 
  flm_def("group", numer = "L.lactating", denom = "L.pregnant") |> 
  fdge(method = "edgeR-qlf")

And now run the interaction. By default, each analysis will be rerun using the union of the (filtered) features from the individual comparisons so that we generate statistics for all genes from each comparison.

dge.i <- compare(dge.L, dge.b)

Heatmaps

Facile heatmaps are not supported yet, but we can show how to use iheatmapr produce the same heatmap.

Gene Set



facilebio/FacileAnalysis documentation built on March 15, 2024, 7:37 a.m.