R/renderReport.R

Defines functions renderReport

Documented in renderReport

#' Generate a HTML/PDF report exploring a set of genomic regions
#'
#' This function generates a HTML report with quality checks, genome location
#' exploration, and an interactive table with the results. Other output formats
#' are possible such as PDF but lose the interactivity. Users can easily append
#' to the report by providing a R Markdown file to \code{customCode}, or can
#' customize the entire template by providing an R Markdown file to
#' \code{template}.
#'
#' @param regions The set of genomic regions of interest as a \code{GRanges}
#' object. All sequence lengths must be provided.
#' @param project The title of the project.
#' @param pvalueVars The names of the variables with values between 0 and 1 to
#' plot density values by chromosome and a table for commonly used cutoffs.
#' Most commonly used to explore p-value distributions. If a named
#' character vector is provided, the names are used in the plot titles.
#' @param densityVars The names of variables to use for making density plots
#' by chromosome. Commonly used to explore scores and other variables given
#' by region.  If a named character vector is provided, the names are used in
#' the plot titles.
#' @param significantVar A \code{logical} variable differentiating statistically
#' significant regions from the rest. When provided, both types of regions
#' are compared against each other to see differences in width, location, etc.
#' @param annotation The output from \link[bumphunter]{matchGenes} used on
#' \code{regions}. Note that this can take time for a large set of regions
#' so it's better to pre-compute this information and save it.
#' @param nBestRegions The number of regions to include in the interactive
#' table.
#' @param customCode An absolute path to a child R Markdown file with code to be
#' evaluated before the reproducibility section. Its useful for users who want
#' to customize the report by adding conclusions derived from the data and/or
#' further quality checks and plots.
#' @param outdir The name of output directory.
#' @param output The name of output HTML file (without the html extension).
#' @param browse If \code{TRUE} the HTML report is opened in your browser once
#' it's completed.
#' @param txdb Specify the transcription database to use for identifying the
#' closest genes via \link[bumphunter]{matchGenes}. If \code{NULL} it will
#' use TxDb.Hsapiens.UCSC.hg19.knownGene by default.
#' @param device The graphical device used when knitting. See more at
#' <http://yihui.name/knitr/options> (\code{dev} argument).
#' @param densityTemplates A list of length 3 with templates for the p-value
#' density plots (variables from \code{pvalueVars}), the continuous
#' variables density plots (variables from \code{densityVars}), and Manhattan
#' plots for the p-value variables (\code{pvalueVars}). These templates
#' are processed by \link[whisker]{whisker.render}. Check the default templates
#' for more information. The \code{densityTemplates} argument is available for
#' those users interested in customizing these plots. For example, to show
#' histograms instead of density plots use \code{templatePvalueHistogram} and
#' \code{templateHistogram} instead of \code{templatePvalueDensity} and
#' \code{templateDensity} respectively.
#' @param template Template file to use for the report. If not provided, will
#' use the default file found in regionExploration/regionExploration.Rmd
#' within the package source.
#' @param theme A ggplot2 \link[ggplot2]{theme} to use for the plots made with
#' ggplot2.
#' @param digits The number of digits to round to in the interactive table of
#' the top \code{nBestRegions}. Note that p-values and adjusted p-values won't
#' be rounded.
#' @param ... Arguments passed to other methods and/or advanced arguments.
#' Advanced arguments:
#' \describe{
#' \item{overviewParams }{ A two element list with \code{base_size} and
#' \code{areaRel} that control the text size for the genomic overview plots.}
#' \item{output_format }{ Either \code{html_document}, \code{pdf_document} or
#' \code{knitrBootstrap::bootstrap_document} unless you modify the YAML
#' template.}
#' \item{clean }{ Logical, whether to clean the results or not. Passed to
#' \link[rmarkdown]{render}.}
#' }
#'
#' @return An HTML report with a basic exploration for the given set of
#' genomic regions. See the example report at
#' <http://leekgroup.github.io/regionReport/reference/renderReport-example/regionExploration.html>.
#'
#' @author Leonardo Collado-Torres
#' @export
#'
#' @importFrom RefManageR PrintBibliography Citep WriteBib as.BibEntry
#' @importFrom GenomeInfoDb seqlengths
#' @importFrom utils browseURL citation packageVersion
#' @importFrom rmarkdown render
#' @importFrom GenomicRanges mcols 'mcols<-'
#' @importFrom knitrBootstrap knit_bootstrap
#' @importFrom BiocStyle html_document
#' @importFrom methods is
#'
#' @details
#' Set \code{output_format} to \code{'knitrBootstrap::bootstrap_document'} or
#' \code{'pdf_document'} if you want a HTML report styled by knitrBootstrap or
#' a PDF report respectively. If using knitrBootstrap, we recommend the version
#' available only via GitHub at <https://github.com/jimhester/knitrBootstrap>
#' which has nicer features than the current version available via CRAN. You can
#' also set the \code{output_format} to \code{'html_document'} for a HTML
#' report styled by rmarkdown. The default is set to
#' \code{'BiocStyle::html_document'}.
#'
#' If you modify the YAML front matter of \code{template}, you can use other
#' values for \code{output_format}.
#'
#' The HTML report styled with knitrBootstrap can be smaller in size than the
#' \code{'html_document'} report.
#'
#' @examples
#'
#' ## Load derfinder for an example set of regions
#' library("derfinder")
#' regions <- genomeRegions$regions
#'
#' ## Assign chr length
#' library("GenomicRanges")
#' seqlengths(regions) <- c("chr21" = 48129895)
#'
#' ## The output will be saved in the 'renderReport-example' directory
#' dir.create("renderReport-example", showWarnings = FALSE, recursive = TRUE)
#'
#' ## Generate the HTML report
#' report <- renderReport(regions, "Example run",
#'     pvalueVars = c(
#'         "Q-values" = "qvalues", "P-values" = "pvalues"
#'     ), densityVars = c(
#'         "Area" = "area", "Mean coverage" = "meanCoverage"
#'     ),
#'     significantVar = regions$qvalues <= 0.05, nBestRegions = 20,
#'     outdir = "renderReport-example"
#' )
#'
#' if (interactive()) {
#'     ## Browse the report
#'     browseURL(report)
#' }
#'
#' ## See the example report at
#' ## http://leekgroup.github.io/regionReport/reference/renderReport-example/regionExploration.html
#'
#'
#' ## Check the default templates. For users interested in customizing these
#' ## plots.
#' ## For p-value variables:
#' cat(regionReport::templatePvalueDensity)
#'
#' ## For continous variables:
#' cat(regionReport::templateDensity)
#'
#' ## For Manhattan plots
#' cat(regionReport::templateManhattan)
#'
#' ##################################################
#' ## bumphunter example mentioned in the vignette ##
#' ##################################################
#'
#' ## Load bumphunter
#' library("bumphunter")
#'
#' ## Create data from the vignette
#' pos <- list(
#'     pos1 = seq(1, 1000, 35),
#'     pos2 = seq(2001, 3000, 35),
#'     pos3 = seq(1, 1000, 50)
#' )
#' chr <- rep(paste0("chr", c(1, 1, 2)), times = sapply(pos, length))
#' pos <- unlist(pos, use.names = FALSE)
#'
#' ## Find clusters
#' cl <- clusterMaker(chr, pos, maxGap = 300)
#'
#' ## Build simulated bumps
#' Indexes <- split(seq_along(cl), cl)
#' beta1 <- rep(0, length(pos))
#' for (i in seq(along = Indexes)) {
#'     ind <- Indexes[[i]]
#'     x <- pos[ind]
#'     z <- scale(x, median(x), max(x) / 12)
#'     beta1[ind] <- i * (-1)^(i + 1) * pmax(1 - abs(z)^3, 0)^3 ## multiply by i to vary size
#' }
#'
#' ## Build data
#' beta0 <- 3 * sin(2 * pi * pos / 720)
#' X <- cbind(rep(1, 20), rep(c(0, 1), each = 10))
#' set.seed(23852577)
#' error <- matrix(rnorm(20 * length(beta1), 0, 1), ncol = 20)
#' y <- t(X[, 1]) %x% beta0 + t(X[, 2]) %x% beta1 + error
#'
#' ## Perform bumphunting
#' tab <- bumphunter(y, X, chr, pos, cl, cutoff = .5)
#'
#' ## Explore data
#' lapply(tab, head)
#'
#' library("GenomicRanges")
#'
#' ## Build GRanges with sequence lengths
#' regions <- GRanges(
#'     seqnames = tab$table$chr,
#'     IRanges(start = tab$table$start, end = tab$table$end),
#'     strand = "*", value = tab$table$value, area = tab$table$area,
#'     cluster = tab$table$cluster, L = tab$table$L, clusterL = tab$table$clusterL
#' )
#'
#' ## Assign chr lengths
#' seqlengths(regions) <- seqlengths(
#'     getChromInfoFromUCSC("hg19", as.Seqinfo = TRUE)
#' )[
#'     names(seqlengths(regions))
#' ]
#'
#' ## Explore the regions
#' regions
#'
#' ## Now create the report
#' report <- renderReport(regions, "Example bumphunter",
#'     pvalueVars = NULL,
#'     densityVars = c(
#'         "Area" = "area", "Value" = "value",
#'         "Cluster Length" = "clusterL"
#'     ), significantVar = NULL,
#'     output = "bumphunter-example", outdir = "bumphunter-example",
#'     device = "png"
#' )
#'
#' ## See the example report at
#' ## http://leekgroup.github.io/regionReport/reference/bumphunter-example/bumphunter-example.html
renderReport <- function(regions, project = "",
    pvalueVars = c("P-values" = "pval"),
    densityVars = NULL, significantVar = mcols(regions)$pval <= 0.05,
    annotation = NULL, nBestRegions = 500, customCode = NULL,
    outdir = "regionExploration", output = "regionExploration",
    browse = interactive(), txdb = NULL, device = "png",
    densityTemplates = list(
        Pvalue = templatePvalueDensity,
        Common = templateDensity, Manhattan = templateManhattan
    ),
    template = NULL, theme = NULL, digits = 2, ...) {
    ## Save start time for getting the total processing time
    startTime <- Sys.time()


    ## Check inputs
    stopifnot(is(regions, "GRanges"))
    stopifnot(!any(is.na(seqlengths(regions))))
    stopifnot(is.list(densityTemplates) & length(densityTemplates) == 3 & all(c("Pvalue", "Common", "Manhattan") %in% names(densityTemplates)))
    hasCustomCode <- !is.null(customCode)
    if (hasCustomCode) stopifnot(length(customCode) == 1)
    if (!is.null(annotation)) stopifnot(nrow(annotation) == length(regions))
    if (!is.null(theme)) stopifnot(is(theme, c("theme", "gg")))

    # @param overviewParams A two element list with \code{base_size} and
    # \code{areaRel} that control the text size for the genomic overview plots.
    overviewParams <- .advanced_argument("overviewParam", list(
        base_size = 10,
        areaRel = 5
    ), ...)


    ## Check that overviewParams is correctly specified
    stopifnot(sum(names(overviewParams) %in% c("base_size", "areaRel")) == 2)

    ## Are there p-value vars?
    hasPvalueVars <- length(pvalueVars) > 0
    if (hasPvalueVars) stopifnot(is.character(pvalueVars))
    ## Fix p-value variable names
    colnames(mcols(regions))[which(colnames(mcols(regions)) %in% pvalueVars)] <- make.names(pvalueVars)
    pvalueVars[seq_len(length(pvalueVars))] <- make.names(pvalueVars)

    ## Are there density vars?
    hasDensityVars <- length(densityVars) > 0
    if (hasDensityVars) stopifnot(is.character(densityVars))
    ## Fix density variable names
    colnames(mcols(regions))[which(colnames(mcols(regions)) %in% densityVars)] <- make.names(densityVars)
    densityVars[seq_len(length(densityVars))] <- make.names(densityVars)

    ## Are there significant regions?
    hasSignificant <- length(significantVar) > 0
    if (hasSignificant) {
        stopifnot(length(significantVar) == length(regions))
        stopifnot(is.logical(significantVar))
        hasSignificant <- any(significantVar)
        if (!hasSignificant) warning("There are no statistically significant regions in this set.")
    }

    ## Template for pvalueVars
    if (hasPvalueVars) {
        templatePvalueDensityInUse <- densityTemplates$Pvalue
        templateManhattanInUse <- densityTemplates$Manhattan
    }

    ## Template for densityVars
    if (hasDensityVars) templateDensityInUse <- densityTemplates$Common


    ## Create outdir
    dir.create(outdir, showWarnings = FALSE, recursive = TRUE)
    workingDir <- getwd()

    ## Locate Rmd if one is not provided
    if (is.null(template)) {
        templateNull <- TRUE
        template <- system.file(
            file.path("regionExploration", "regionExploration.Rmd"),
            package = "regionReport", mustWork = TRUE
        )
    } else {
        templateNull <- FALSE
    }

    ## Check all packages (from suggests) needed for the report
    if (is.null(txdb)) load_check("TxDb.Hsapiens.UCSC.hg19.knownGene")
    load_check(c(
        "bumphunter", "derfinderPlot", "ggbio", "ggplot2", "grid", "gridExtra",
        "RColorBrewer", "mgcv", "whisker", "DT", "sessioninfo"
    ))

    ## Write bibliography information
    bib <- c(
        RefManageR = citation("RefManageR")[1],
        regionReport = citation("regionReport")[1],
        derfinderPlot = citation("derfinderPlot")[1],
        DT = citation("DT"),
        ggbio = citation("ggbio"),
        ggplot2 = citation("ggplot2"),
        knitr = citation("knitr")[3],
        rmarkdown = citation("rmarkdown")[1],
        whisker = citation("whisker"),
        bumphunter = citation("bumphunter")[1],
        derfinder = citation("derfinder")[1]
    )
    WriteBib(as.BibEntry(bib), file = file.path(outdir, paste0(output, ".bib")))

    ## Save the call
    theCall <- match.call()

    ## Generate report
    ## Perform code within the output directory.
    tmpdir <- getwd()
    with_wd(outdir, {
        file.copy(template, to = paste0(output, ".Rmd"))

        ## Output format
        output_format <- .advanced_argument(
            "output_format",
            "BiocStyle::html_document", ...
        )
        outputIsHTML <- output_format %in% c(
            "html_document",
            "rmarkdown::html_document",
            "knitrBootstrap::bootstrap_document", "BiocStyle::html_document"
        )
        if (!outputIsHTML) {
            if (device == "png") warning("You might want to switch the 'device' argument from 'png' to 'pdf' for better quality plots.")
        }

        ## Check knitrBoostrap version
        knitrBootstrapFlag <- packageVersion("knitrBootstrap") < "1.0.0"
        if (knitrBootstrapFlag & output_format == "knitrBootstrap::bootstrap_document") {
            ## CRAN version
            tmp <- knit_bootstrap(paste0(output, ".Rmd"), chooser = c(
                "boot",
                "code"
            ), show_code = TRUE)
            res <- file.path(tmpdir, outdir, paste0(output, ".html"))
            unlink(paste0(output, ".md"))
        } else {
            res <- render(paste0(output, ".Rmd"), output_format,
                clean = .advanced_argument("clean", TRUE, ...)
            )
        }
        if (templateNull) file.remove(paste0(output, ".Rmd"))

        ## Open
        if (browse) browseURL(res)
    })

    ## Finish
    return(invisible(res))
}


#' @rdname renderReport
#' @export
templatePvalueDensity <- "
## {{{densityVarName}}}

```{r pval-density-{{{varName}}}, fig.width=10, fig.height=10}
p1{{{varName}}} <- ggplot(regions.df.plot, aes(x={{{varName}}}, colour=seqnames)) +
    geom_line(stat='density') + xlim(0, 1) +
    labs(title='Density of {{{densityVarName}}}') + xlab('{{{densityVarName}}}') +
    scale_colour_discrete(limits=chrs) + theme(legend.title=element_blank())
p1{{{varName}}}
```


```{r 'pval-summary-{{{varName}}}'}
summary(mcols(regions)[['{{{varName}}}']])
```


This is the numerical summary of the distribution of the {{{densityVarName}}}.

```{r pval-tableSummary-{{{varName}}}, results='asis'}
{{{varName}}}table <- lapply(c(1e-04, 0.001, 0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5,
    0.6, 0.7, 0.8, 0.9, 1), function(x) {
    data.frame('Cut' = x, 'Count' = sum(mcols(regions)[['{{{varName}}}']] <= x))
})
{{{varName}}}table <- do.call(rbind, {{{varName}}}table)
if(outputIsHTML) {
    kable({{{varName}}}table, format = 'markdown', align = c('c', 'c'))
} else {
    kable({{{varName}}}table)
}
```

This table shows the number of regions with {{{densityVarName}}} less or equal than some commonly used cutoff values.

"

#' @rdname renderReport
#' @export
templateDensity <- "
## {{{densityVarName}}}

```{r density-{{{varName}}}, fig.width=14, fig.height=14, eval=hasSignificant, echo=hasSignificant}
xrange <- range(regions.df.plot[, '{{{varName}}}']) * c(0.95, 1.05)
p3a{{{varName}}} <- ggplot(regions.df.plot[is.finite(regions.df.plot[, '{{{varName}}}']), ], aes(x={{{varName}}}, colour=seqnames)) +
    geom_line(stat='density') + labs(title='Density of {{{densityVarName}}}') +
    xlab('{{{densityVarName}}}') + scale_colour_discrete(limits=chrs) +
    xlim(xrange) + theme(legend.title=element_blank())
p3b{{{varName}}} <- ggplot(regions.df.sig[is.finite(regions.df.sig[, '{{{varName}}}']), ], aes(x={{{varName}}}, colour=seqnames)) +
    geom_line(stat='density') +
    labs(title='Density of {{{densityVarName}}} (significant only)') +
    xlab('{{{densityVarName}}}') + scale_colour_discrete(limits=chrs) +
    xlim(xrange) + theme(legend.title=element_blank())
grid.arrange(p3a{{{varName}}}, p3b{{{varName}}})
```

```{r density-solo-{{{varName}}}, fig.width=10, fig.height=10, eval=!hasSignificant, echo=!hasSignificant}
p3a{{{varName}}} <- ggplot(regions.df.plot[is.finite(regions.df.plot[, '{{{varName}}}']), ], aes(x={{{varName}}}, colour=seqnames)) +
    geom_line(stat='density') + labs(title='Density of {{{densityVarName}}}') +
    xlab('{{{densityVarName}}}') + scale_colour_discrete(limits=chrs) +
    theme(legend.title=element_blank())
p3a{{{varName}}}
```

This plot shows the density of the {{{densityVarName}}} for all regions. `r ifelse(hasSignificant, 'The bottom panel is restricted to significant regions.', '')`

"

#' @rdname renderReport
#' @export
templateManhattan <- "
## Manhattan {{{densityVarName}}}

```{r manhattan-{{{varName}}}, fig.width=10, fig.height=10, message = FALSE}

regions.manhattan <- regions
mcols(regions.manhattan)[['{{{varName}}}']] <- - log(mcols(regions.manhattan)[['{{{varName}}}']], base = 10)
pMan{{{varName}}} <- plotGrandLinear(regions.manhattan, aes(y = {{{varName}}}, colour = seqnames)) + theme(axis.text.x=element_text(angle=-90, hjust=0)) + ylab('-log10 {{{densityVarName}}}')
pMan{{{varName}}}
rm(regions.manhattan)
```

This is a Manhattan plot for the {{{densityVarName}}} for all regions. A single dot is shown for each region, where higher values in the y-axis mean that the {{{densityVarName}}} are closer to zero.

"

#' @rdname renderReport
#' @export
templatePvalueHistogram <- "
## {{{densityVarName}}}

```{r histPval-{{{varName}}}, fig.width=10, fig.height=10}
p1{{{varName}}} <- ggplot(regions.df.plot, aes(x={{{varName}}}, colour=seqnames)) +
    geom_histogram(bins = 50, alpha=.5, position='identity') +
    xlim(c(0, 1.0005)) +
    labs(title='Histogram of {{{densityVarName}}}') +
    xlab('{{{densityVarName}}}') +
    scale_colour_discrete(limits=chrs) + theme(legend.title=element_blank())
p1{{{varName}}}
```

This plot shows the distribution of {{{densityVarName}}} in a histogram. It might be skewed right or left, or flat as shown in the [Wikipedia examples](https://en.wikipedia.org/wiki/Histogram#Examples). The shape depends on the percent of regions that are differentially expressed. For further information on how to interpret a histogram of p-values check [David Robinson's post on this topic](http://varianceexplained.org/statistics/interpreting-pvalue-histogram/).


```{r 'summary-{{{varName}}}'}
summary(mcols(regions)[['{{{varName}}}']])
```


This is the numerical summary of the distribution of the {{{densityVarName}}}.

```{r tableSummary-{{{varName}}}, results='asis'}
{{{varName}}}table <- lapply(c(1e-04, 0.001, 0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5,
    0.6, 0.7, 0.8, 0.9, 1), function(x) {
    data.frame('Cut' = x, 'Count' = sum(mcols(regions)[['{{{varName}}}']] <= x))
})
{{{varName}}}table <- do.call(rbind, {{{varName}}}table)
if(outputIsHTML) {
    kable({{{varName}}}table, format = 'markdown', align = c('c', 'c'))
} else {
    kable({{{varName}}}table)
}
```

This table shows the number of regions with {{{densityVarName}}} less or equal than some commonly used cutoff values.

"

#' @rdname renderReport
#' @export
templateHistogram <- "

## {{{densityVarName}}}

```{r histogram-{{{varName}}}, fig.width=14, fig.height=14, eval=hasSignificant, echo=hasSignificant}
xrange <- range(regions.df.plot[, '{{{varName}}}'])
p3a{{{varName}}} <- ggplot(regions.df.plot[is.finite(regions.df.plot[, '{{{varName}}}']), ], aes(x={{{varName}}}, fill=seqnames)) +
    geom_histogram(alpha=.5, position='identity', bins = 50) +
    labs(title='Histogram of {{{densityVarName}}}') +
    xlab('{{{densityVarName}}}') +
    xlim(xrange) + theme(legend.title=element_blank())
p3b{{{varName}}} <- ggplot(regions.df.sig[is.finite(regions.df.sig[, '{{{varName}}}']), ], aes(x={{{varName}}}, fill=seqnames)) +
    geom_histogram(alpha=.5, position='identity', bins = 50) +
    labs(title='Histogram of {{{densityVarName}}} (significant only)') +
    xlab('{{{densityVarName}}}') +
    xlim(xrange) + theme(legend.title=element_blank())
grid.arrange(p3a{{{varName}}}, p3b{{{varName}}})
```

```{r histogram-solo-{{{varName}}}, fig.width=10, fig.height=10, eval=!hasSignificant, echo=!hasSignificant}
p3a{{{varName}}} <- ggplot(regions.df.plot[is.finite(regions.df.plot[, '{{{varName}}}']), ], aes(x={{{varName}}}, fill=seqnames)) +
    geom_histogram(alpha=.5, position='identity', bins = 50) +
    labs(title='Histogram of {{{densityVarName}}}') +
    xlab('{{{densityVarName}}}') +
    theme(legend.title=element_blank())
p3a{{{varName}}}
```

This plot shows the distribution of {{{densityVarName}}} in a histogram for all regions. `r ifelse(hasSignificant, 'The bottom panel is restricted to significant regions.', '')`

"
leekgroup/regionReport documentation built on Nov. 23, 2021, 7:32 p.m.