## nolint start
suppressPackageStartupMessages({
    library(goalie)
    library(basejump)
    library(bcbioRNASeq)
    library(DESeq2)
    library(DESeqAnalysis)
})
prepareTemplate()
source("_setup.R")
## nolint end
invisible(lapply(
    X = c(
        params[["data_dir"]],
        params[["results_dir"]]
    ),
    FUN = initDir
))

Load bcbioRNASeq object

object <- import(params[["bcb_file"]])
assert(
    is(object, "bcbioRNASeq"),
    validObject(object)
)
print(object)

Create DESeqDataSet

Here we are using an S4 coercion method to convert our bcbioRNASeq object to a DESeqDataSet. This prepares a gene-level RangedSummarizedExperiment with raw integer counts defined in the assay() slot. Internally this uses the DESeqDataSet() constructor function and sets an empty design formula. The desired design formula can be set with the design() function.

dds <- as.DESeqDataSet(object)
assert(
    is(dds, "DESeqDataSet"),
    validObject(dds)
)
print(dds)

Design

The design formula, specified with the design() function, must contain factor columns in colData() / sampleData().

Ensure that all relevant factor columns in sampleData() contain valid names (see make.names() for details) prior to setting the design. We recommend using the snakeCase() function to automatically sanitize all factors into snake case.

## > colnames(colData(dds))
design(dds) <- params[["design"]]

Pre-filtering

While it is not necessary to pre-filter low count genes before running the DESeq2 functions, there are two reasons which make pre-filtering useful: by removing rows in which there are very few reads, we reduce the memory size of the dds data object, and we increase the speed of the transformation and testing functions within DESeq2. Here we perform a minimal pre-filtering to keep only rows that have at least 10 reads total. Note that more strict filtering to increase power is automatically applied via independent filtering on the mean of normalized counts within the results() function.

## Note that this criteria can be made more stringent.
## Refer to DESeq2 vignette for examples.
keep <- rowSums(counts(dds)) >= 10L
dds <- dds[keep, ]
print(dds)

Factor reference level

By default, [R][] will choose a reference level for factors based on alphabetical order. Then, if you never tell the [DESeq2][] functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels. There are two solutions: you can either explicitly tell results which comparison to make using the contrast argument (this will be shown later), or you can explicitly set the factors levels. You should only change the factor levels of variables in the design before running the [DESeq2][] analysis, not during or afterward. Setting the factor levels can be done in two ways, using either the factor() or relevel() functions. Generally we recommend using relevel() here.

## Including this code here for reference only.

## Specify the reference level (preferred).
## > colData(dds)[["group"]] <-
## >     relevel(colData(dds)[["group"]], ref = "control")

## Alternatively, can explicitly define, using `factor()`.
## When using this approach, put the reference level (control) first.
## > colData(dds)[["group"]] <-
## >     factor(colData(dds)[["group"]], levels = c("treatment", "control"))

## If samples have been subset, ensure that the levels match.
## > colData(dds)[["treatment"]] <- droplevels(colData(dds)[["group"]])

Differential expression analysis

Now that our DESeqDataSet is properly set up, we can move on to performing the differential expression. The standard differential expression analysis steps for [DESeq2][] are wrapped into a single function, DESeq(). Results tables are generated using the function results(), which extracts a results table with log2 fold changes, P values and adjusted P values. With no additional arguments to results(), the log2 fold change and Wald test P value will be for the last variable in the design formula (design()), and if this is a factor, the comparison will be the last level of this variable over the reference level (see previous note on factor levels). However, the order of the variables of the design do not matter so long as the user specifies the comparison to build a results table for, using the name or contrast arguments of results.

dds <- DESeq(dds)

Variance stabilization

After we perform the differential expression, we need to calculate variance-stabilized counts, which are stored in a DESeqTransform object. These transformed counts are useful for visualization. We currently recommend using varianceStabilizingTransformation() but rlog() is a good alternate.

## Alternatively, can use `rlog()` here instead, but it is slower.
dt <- varianceStabilizingTransformation(dds)
assert(is(dt, "DESeqTransform"))
interestingGroups(dt) <- interestingGroups(object)

Results

For contrast argument as character vector:

  1. Design matrix factor of interest.
  2. Numerator for LFC (expt).
  3. Denominator for LFC (control).
## factor; numerator; denominator
## > levels(colData(dds[["group"]]))
## > help(topic = "results", package = "DESeq2")
print(params[["contrasts"]])
resListUnshrunken <- Map(
    f = DESeq2::results,
    contrast = params[["contrasts"]],
    MoreArgs = list(
        "object" = dds,
        "alpha" = params[["alpha_threshold"]],
        "lfcThreshold" = params[["lfc_threshold"]]
    )
)

Now let's calculate shrunken log2 fold change values with DESeq2::lfcShrink(). We're using the "normal" shrinkage estimator (default in DESeq2); the "apeglm" shrinkage estimator is also promising but doens't work well with complex contrast designs.

## Refer to DESeqAnalysis package if you want to use newer apeglm instead.
## See `help(topic = "apeglmResults", package = "DESeqAnalysis")` for details.
resListShrunken <- Map(
    f = DESeq2::lfcShrink,
    res = resListUnshrunken,
    contrast = params[["contrasts"]],
    MoreArgs = list(
        "dds" = dds,
        "type" = "normal",
        "alpha" = params[["alpha_threshold"]],
        "lfcThreshold" = params[["lfc_threshold"]]
    )
)

We performed the analysis using a BH adjusted P value cutoff of r params[["alpha_threshold"]] and a log fold-change (LFC) ratio cutoff of r params[["lfc_threshold"]].

Package into DESeqAnalysis object

deseq <- DESeqAnalysis(
    data = dds,
    transform = dt,
    results = resListUnshrunken,
    lfcShrink = resListShrunken
)
saveData(deseq, dir = params[["data_dir"]])

Plots

MA plot {.tabset}

An MA plot compares transformed counts on M (log ratio) and A (mean average) scales [@Yang2002-sx].

invisible(Map(
    f = function(object, i) {
        markdownHeader(text = i, level = 4L, asis = TRUE)
        print(plotMa(object = object, i = i))
    },
    i = resultsNames(deseq),
    MoreArgs = list("object" = deseq)
))

Volcano plot {.tabset}

A volcano plot compares significance (BH-adjusted P value) against fold change (log2) [@Cui2003-rn; @Li2014-ll]. Genes in the green box with text labels have an adjusted P value are likely to be the top candidate genes of interest.

invisible(Map(
    f = function(object, i) {
        markdownHeader(text = i, level = 4L, asis = TRUE)
        print(plotVolcano(object = object, i = i))
    },
    i = resultsNames(deseq),
    MoreArgs = list("object" = deseq)
))

DEG PCA {.tabset}

invisible(Map(
    f = function(object, i) {
        markdownHeader(text = i, level = 4L, asis = TRUE)
        print(plotDegPca(object = object, i = i))
    },
    i = resultsNames(deseq),
    MoreArgs = list("object" = deseq)
))

DEG Heatmap {.tabset}

This plot shows only differentially expressed genes on a per-sample basis. We have scaled the data by row and used the ward.D2 method for clustering [@Ward1963-xf].

invisible(Map(
    f = function(object, i, ...) {
        markdownHeader(text = i, level = 4L)
        plotDegHeatmap(object = object, i = i, ...)
    },
    i = resultsNames(deseq),
    MoreArgs = list(
        "object" = deseq,
        "clusteringMethod" = "ward.D2",
        "scale" = "row"
    )
))

Top tables

Only the top up- and down-regulated genes are shown.

invisible(Map(
    f = markdownTables,
    i = resultsNames(deseq),
    MoreArgs = list(
        "object" = deseq,
        "n" = 20L
    )
))

Export results

Subset the results into separate tables, containing all genes, differentially expressed genes in both directions, and directional tables.

export(
    object = deseq,
    con = params[["results_dir"]]
)

Differentially expressed gene (DEG) tables are sorted by BH-adjusted P value, and contain the following columns:





hbc/bcbioRNASeq documentation built on March 28, 2024, 3:01 p.m.