Introduction to regionReport

Basics

Install r Biocpkg('regionReport')

R is an open-source statistical environment which can be easily modified to enhance its functionality via packages. r Biocpkg('regionReport') is a R package available via the Bioconductor repository for packages. R can be installed on any operating system from CRAN after which you can install r Biocpkg('regionReport') by using the following commands in your R session:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
      install.packages("BiocManager")
  }

BiocManager::install("regionReport")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

Required knowledge

r Biocpkg('regionReport') is based on many other packages and in particular in those that have implemented the infrastructure needed for dealing with RNA-seq data. That is, packages like r Biocpkg('GenomicFeatures') that allow you to import the data and r Biocpkg('DESeq2') for generating differential expression results. A r Biocpkg('regionReport') user is not expected to deal with those packages directly.

If you are asking yourself the question "Where do I start using Bioconductor?" you might be interested in this blog post.

Asking for help

As package developers, we try to explain clearly how to use our packages and in which order to use the functions. But R and Bioconductor have a steep learning curve so it is critical to learn where to ask for help. The blog post quoted above mentions some but we would like to highlight the Bioconductor support site as the main resource for getting help: remember to use the regionReport tag and check the older posts. Other alternatives are available such as creating GitHub issues and tweeting. However, please note that if you want to receive help you should adhere to the posting guidelines. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error.

Citing r Biocpkg('regionReport')

We hope that r Biocpkg('regionReport') will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!

## Citation info
citation("regionReport")

HTML reports for a set differential region results

## Track time spent on making the vignette
startTimeVignette <- Sys.time()

## Bib setup
library("RefManageR")

## Write bibliography information
bib <- c(
    derfinder = citation("derfinder")[1],
    regionReport = citation("regionReport")[1],
    knitrBootstrap = citation("knitrBootstrap"),
    BiocStyle = citation("BiocStyle"),
    ggbio = citation("ggbio"),
    ggplot2 = citation("ggplot2"),
    knitr = citation("knitr")[3],
    RefManageR = citation("RefManageR")[1],
    rmarkdown = citation("rmarkdown")[1],
    DT = citation("DT"),
    R = citation(),
    IRanges = citation("IRanges"),
    sessioninfo = citation("sessioninfo"),
    GenomeInfoDb = RefManageR::BibEntry(
        bibtype = "manual",
        key = "GenomeInfoDb",
        author = "Sonali Arora and Martin Morgan and Marc Carlson and H. Pagès",
        title = "GenomeInfoDb: Utilities for manipulating chromosome and other 'seqname' identifiers",
        year = 2017, doi = "10.18129/B9.bioc.GenomeInfoDb"
    ),
    GenomicRanges = citation("GenomicRanges"),
    biovizBase = citation("biovizBase"),
    TxDb.Hsapiens.UCSC.hg19.knownGene = citation("TxDb.Hsapiens.UCSC.hg19.knownGene"),
    derfinderPlot = citation("derfinderPlot")[1],
    grid = citation("grid"),
    gridExtra = citation("gridExtra"),
    mgcv = citation("mgcv"),
    RColorBrewer = citation("RColorBrewer"),
    whikser = citation("whisker"),
    bumphunter = citation("bumphunter")[1],
    pheatmap = citation("pheatmap"),
    DESeq2 = citation("DESeq2"),
    edgeR1 = citation("edgeR")[1],
    edgeR2 = citation("edgeR")[2],
    edgeR6 = RefManageR::BibEntry("inbook",
        key = "edgeR6",
        author = "Chen, Yunshun and Lun, Aaron T. L. and Smyth, Gordon K.",
        title = "Differential expression analysis of complex RNA-seq experiments using edgeR",
        booktitle = "Statistical Analysis of Next Generation Sequencing Data",
        year = 2014,
        editor = "Datta, Somnath and Nettleton, Dan", publisher = "Springer",
        location = "New York", pages = "51-74"
    ),
    DEFormats = citation("DEFormats")
)

r Biocpkg('regionReport') r Citep(bib[['regionReport']]) creates HTML or PDF reports for a set of genomic regions such as r Biocpkg('derfinder') r Citep(bib[['derfinder']]) results or for feature-level analyses performed with r Biocpkg('DESeq2') r Citep(bib[['DESeq2']]) or r Biocpkg('edgeR') r Citep(bib[[c('edgeR1', 'edgeR2', 'edgeR6')]]). The HTML reports are styled with r CRANpkg('rmarkdown') r Citep(bib[['rmarkdown']]) by default but can optionally be styled with r CRANpkg('knitrBootstrap') r Citep(bib[['knitrBootstrap']]).

This package includes a basic exploration for a general set of genomic regions which can be easily customized to include the appropriate conclusions and/or further exploration of the results. Such a report can be generated using renderReport(). r Biocpkg('regionReport') has a separate template for running a basic exploration analysis of r Biocpkg('derfinder') results by using derfinderReport(). derfinderReport() is specific to single base-level approach r Biocpkg('derfinder') results. A third template is included for exploring r Biocpkg('DESeq2') or r Biocpkg('edgeR') differential expression results.

All reports are written in R Markdown format and include all the code for making the plots and explorations in the report itself. For all templates, r Biocpkg('regionReport') relies on r CRANpkg('knitr') r Citep(bib[['knitr']]), r CRANpkg('rmarkdown') r Citep(bib[['rmarkdown']]), DT r Citep(bib[['DT']]) and optionally r CRANpkg('knitrBootstrap') r Citep(bib[['knitrBootstrap']]) for generating the report. The reports can be either in HTML or PDF format and can be easily customized.

Using r Biocpkg('regionReport') for r Biocpkg('DESeq2') results

The plots in r Biocpkg('regionReport') for exploring r Biocpkg('DESeq2') are powered by r CRANpkg('ggplot2') r Citep(bib[['ggplot2']]) and r CRANpkg('pheatmap') r Citep(bib[['pheatmap']]).

Example

The r Biocpkg('regionReport') supplementary website regionReportSupp has examples of using r Biocpkg('regionReport') with r Biocpkg('DESeq2') results. In particular, please look at DESeq2.html which has the code for generating some r Biocpkg('DESeq2') results based on the r Biocpkg('DESeq2') vignette. Then it uses those results to create HTML and PDF versions of the same report. The resulting reports are available in the following locations:

Note that in both examples we changed the r CRANpkg('ggplot2') theme to theme_bw(). Also, in the PDF version we used the option device = 'pdf' instead of the default device = 'png' in DESeq2Report() since PDF figures are more appropriate for PDF reports: they look better than PNG figures.

If you want to create a similar HTML report as the one linked in this section, simply run example('DESeq2Report', 'regionReport', ask=FALSE). The only difference will be the r CRANpkg('ggplot2') theme for the plots.

Using r Biocpkg('regionReport') for r Biocpkg('edgeR') results

r Biocpkg('regionReport') has the edgeReport() function that takes as input a DGEList object and the results from the differential expression analysis using r Biocpkg('edgeR'). edgeReport() internally uses r Biocpkg('DEFormats') to convert the results to r Biocpkg('DESeq2')'s format and then uses DESeqReport() to create the final report. The report looks nearly the same whether you performed the differential expression analysis with r Biocpkg('DESeq2') or r Biocpkg('edgeR') in order to make more homogenous the exploratory data analysis step.

Example

The r Biocpkg('regionReport') supplementary website regionReportSupp has examples of using r Biocpkg('regionReport') with r Biocpkg('edgeR') results. In particular, please look at edgeR.html which has the code for generating some random data with r Biocpkg('DEFormats') and performing the differential expression analysis with r Biocpkg('edgeR'). Then it uses those results to create HTML and PDF versions of the same report. The resulting reports are available in the following locations:

Note that in both examples we changed the r CRANpkg('ggplot2') theme to theme_linedraw(). Also, in the PDF version we used the option device = 'pdf' instead of the default device = 'png' in edgeReport() since PDF figures are more appropriate for PDF reports: they look better than PNG figures.

If you want to create a similar HTML report as the one linked in this section, simply run example('edgeReport', 'regionReport', ask=FALSE). The only difference will be the r CRANpkg('ggplot2') theme for the plots and the amount of data simulated with r Biocpkg('DEFormats').

Using r Biocpkg('regionReport') for region results

The plots in r Biocpkg('regionReport') for region reports are powered by r Biocpkg('derfinderPlot') r Citep(bib[['derfinderPlot']]), r Biocpkg('ggbio') r Citep(bib[['ggbio']]), and r CRANpkg('ggplot2') r Citep(bib[['ggplot2']]).

Examples

The r Biocpkg('regionReport') supplementary website regionReportSupp has examples of using r Biocpkg('regionReport') with results from r Biocpkg('DiffBind') and r Biocpkg('derfinder'). Included as a vignette, this package also has an example using a small data set derived from r Biocpkg('bumphunter'). These represent different uses of r Biocpkg('regionReport') for results from ChIP-seq, methylation, and RNA-seq data. In particular, the r Biocpkg('DiffBind') example illustrates how to expand a basic report created with renderReport().

General case

For a general use case, you first have to identify a set of genomic regions of interest and store it as a GRanges object. In a typical workflow you will have some variables measured for each of the regions, such as p-values and scores. renderReport() uses the set of regions and three main arguments:

Other parameters control the name of the report, where it'll be located, the transcripts database used to annotate the nearest genes, graphical parameters, etc.

Here is a short example of how to use renderReport(). Note that we are using regions produced by r Biocpkg('derfinder') just for convenience sake.

## Load derfinder
library("derfinder")
regions <- genomeRegions$regions

## Assign chr length
library("GenomicRanges")
seqlengths(regions) <- c("chr21" = 48129895)

## The output will be saved in the 'derfinderReport-example' directory
dir.create("renderReport-example", showWarnings = FALSE, recursive = TRUE)

## Generate the HTML report
report <- renderReport(regions, "Example run",
    pvalueVars = c(
        "Q-values" = "qvalues", "P-values" = "pvalues"
    ), densityVars = c(
        "Area" = "area", "Mean coverage" = "meanCoverage"
    ),
    significantVar = regions$qvalues <= 0.05, nBestRegions = 20,
    outdir = "renderReport-example"
)

See the report created by this example here.

For r Biocpkg('derfinder') results created via the expressed regions-level approach you can use renderReport() to explore the results. If you use r Biocpkg('DESeq2') to perform the differential expression analysis of the expressed regions you can then use DESeq2Report().

r Biocpkg('derfinder') single base-level case

Run r Biocpkg('derfinder')

Prior to using regionReport::derfinderReport() you must use r Biocpkg('derfinder') to analyze a specific data set. While there are many ways to do so, we recommend using analyzeChr() with the same prefix argument. Then merging the results with mergeResults(). This is the recommended pipeline for the single base-level approach.

Below, we run r Biocpkg('derfinder') for the example data included in the package. The steps are:

  1. Load derfinder
  2. Create a directory where we'll store the results
  3. Generate the pre-requisites for the models to use with the example data
  4. Generate the statistical models
  5. Analyze the example data for chr21
  6. Merge the results (only one chr in this case, but in practice there'll be more)
## Load derfinder
library("derfinder")

## The output will be saved in the "derfinderReport-example" directory
dir.create("derfinderReport-example", showWarnings = FALSE, recursive = TRUE)

The following code runs r Biocpkg('derfinder').

## Save the current path
initialPath <- getwd()
setwd(file.path(initialPath, "derfinderReport-example"))

## Generate output from derfinder

## Collapse the coverage information
collapsedFull <- collapseFullCoverage(list(genomeData$coverage),
    verbose = TRUE
)

## Calculate library size adjustments
sampleDepths <- sampleDepth(collapsedFull,
    probs = c(0.5), nonzero = TRUE,
    verbose = TRUE
)

## Build the models
group <- genomeInfo$pop
adjustvars <- data.frame(genomeInfo$gender)
models <- makeModels(sampleDepths, testvars = group, adjustvars = adjustvars)

## Analyze chromosome 21
analysis <- analyzeChr(
    chr = "21", coverageInfo = genomeData, models = models,
    cutoffFstat = 1, cutoffType = "manual", seeds = 20140330, groupInfo = group,
    mc.cores = 1, writeOutput = TRUE, returnOutput = TRUE
)

## Save the stats options for later
optionsStats <- analysis$optionsStats

## Change the directory back to the original one
setwd(initialPath)

For convenience, we have included the r Biocpkg('derfinder') results as part of r Biocpkg('regionReport'). Note that the above functions are routinely checked as part of r Biocpkg('derfinder').

## Copy previous results
file.copy(system.file(file.path("extdata", "chr21"),
    package = "derfinder",
    mustWork = TRUE
), "derfinderReport-example", recursive = TRUE)

Next, proceed to merging the results.

## Merge the results from the different chromosomes. In this case, there's
## only one: chr21
mergeResults(
    chrs = "chr21", prefix = "derfinderReport-example",
    genomicState = genomicState$fullGenome
)

## Load optionsStats
load(file.path("derfinderReport-example", "chr21", "optionsStats.Rdata"), verbose = TRUE)

Create report

Once the r Biocpkg('derfinder') output has been generated and merged, use derfinderReport() to create the HTML report.

## Load derfindeReport
library("regionReport")
## Generate the HTML report
report <- derfinderReport(
    prefix = "derfinderReport-example", browse = FALSE,
    nBestRegions = 15, makeBestClusters = TRUE,
    fullCov = list("21" = genomeDataRaw$coverage), optionsStats = optionsStats
)

Once the output is generated, you can browse the report from R using browseURL() as shown below.

## Browse the report
browseURL(report)

You can view a pre-compiled version of this report here.

Notes

Note that the reports require an active Internet connection to render correctly.

The report is self-explanatory and will change some of the text depending on the input options.

If the report is taking too long to compile (say more than 3 hours), you might want to consider setting nBestCluters to a small number or even set makeBestClusters to FALSE.

Reproducibility

This package was made possible thanks to:

Code for creating the vignette

## Create the vignette
library("rmarkdown")
system.time(render("regionReport.Rmd", "BiocStyle::html_document"))

## Extract the R code
library("knitr")
knit("regionReport.Rmd", tangle = TRUE)
## Clean up
unlink("derfinderReport-example", recursive = TRUE)

Date the vignette was generated.

## Date the report was generated
Sys.time()

Wallclock time spent generating the vignette.

## Processing time in seconds
totalTimeVignette <- diff(c(startTimeVignette, Sys.time()))
round(totalTimeVignette, digits = 3)

R session information.

## Session info
library("sessioninfo")
options(width = 120)
session_info()

Bibliography

This vignette was generated using r Biocpkg('BiocStyle') r Citep(bib[['BiocStyle']]) with r CRANpkg('knitr') r Citep(bib[['knitr']]) and r CRANpkg('rmarkdown') r Citep(bib[['rmarkdown']]) running behind the scenes.

Citations made with r CRANpkg('RefManageR') r Citep(bib[['RefManageR']]).

## Print bibliography
PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html"))


Try the regionReport package in your browser

Any scripts or data that you put into this service are public.

regionReport documentation built on Dec. 20, 2020, 2:01 a.m.