Creating annotated output with \Biocpkg{affycoretools} and ReportingTools

Overview

The \Biocpkg{affycoretools} package is intended to help people easily create useful output from various analyses. While affycoretools was originally intended for those using Affymetrix microarrays, this is no longer the case. While some functions remain Affy-centric, most are now much more general, and can be used for any microarray or RNA-Seq experiment.

BiocStyle::markdown()
options(bitmapType = "cairo")

Introduction

This package has evolved from my work as a service core biostatistician. I routinely analyze very similar experiments, and wanted to create a way to minimize all the cutting and pasting of code that I found myself doing. In addition, I wanted to come up with a good way to make an analysis reproducible, in the sense that I (or somebody else) could easily re-create the results.

In the past this package relied on the \Biocpkg{annaffy} package, and was intended to be used in concert with a 'Sweave' document that contained both the code that was used to analyze the data, as well as explanatory text that would end up in a pdf (or HTML page) that could be given to a client. In the intervening period, people have developed other, better packages such as \CRANpkg{knitr} and \CRANpkg{rmarkdown} \Biocpkg{ReportingTools} that make it much easier to create the sort of output I like to present to my clients.

This document was generated using an Rmarkdown document (it's the file called RefactoredAffycoretools.Rmd that you can find in your R library directory in the affycoretools/doc folder). Part of using Rmarkdown is generating .Rmd files that contain R code, and for each code block there are some arguments that define how the results for that code block are returned. A good resource for the code options is here.

Using affycoretools

For this section we will be using the \Robject{sample.ExpressionSet} data set that comes with the \Biocpkg{Biobase} package. Remember that you can always run this code at home by doing this:

library(knitr)
purl(system.file("doc/RefactoredAffycoretools.Rmd", package="affycoretools"))

And then you will have a file called RefactoredAffycoretools.R in your working directory that you can either \Rfunction{source} or open with \software{RStudio} or \software{Emacs/ESS}, and run by chunk or line by line.

We first load and rename the data:

suppressMessages(library(affycoretools))
data(sample.ExpressionSet)
eset <- sample.ExpressionSet
eset
featureNames(eset) <- gsub("/", "_", featureNames(eset))

This \Rclass{ExpressionSet} object is a truncated data set, based on an Affymetrix HG-U95av2 array. There are 26 samples and 500 probesets. We will use the \Rclass{phenoData} to fit a linear model using \Biocpkg{limma}. \bioccomment{We will not cover any aspects of fitting a linear model here; the limma User's Guide covers this topic in depth. In addition, this analysis isn't meant to be correct in any sense; we are just doing this to get some data to annotate and output.}

Note that the limma package will retain annotation data that is in the \Rclass{ExpressionSet}, so we add those data in using \Rcode{annotateEset}

suppressMessages(library(hgu95av2.db))
eset <- annotateEset(eset, hgu95av2.db)
suppressMessages(library(limma))
pd <- pData(phenoData(eset))
design <- model.matrix(~0+type+sex, pd)
colnames(design) <- gsub("type|sex", "", colnames(design))
contrast <- matrix(c(1,-1,0))
colnames(contrast) <- "Case vs control"
fit <- lmFit(eset, design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit)
topTable(fit2, 1)[,1:4]

After adding the annotation data to the \Robject{ExpressionSet} object, the \Rfunction{topTable} output now contains the appropriate annotation data for each probeset. At this point we can output an HTML table that contains these data.

suppressMessages(library(ReportingTools))
htab <- HTMLReport("afile", "My cool results")
publish(topTable(fit2, 1), htab)
finish(htab)

If you run the code yourself, you will have an HTML file 'afile.html' in the working directory, that contains the data for our top 10 genes. This table is not particularly interesting, and the \Biocpkg{ReportingTools} package already has functionality to just do something like

htab <- HTMLReport("afile2", "My cool results, ReportingTools style")
publish(fit2, htab, eset, factor = pd$type, coef = 1, n = 10)
finish(htab)

and it will automatically generate another HTML file, 'afile2.html', with some extra plots that show the different groups, and we didn't even have to use \Rfunction{topTable} directly. However, the default plots in the HTML table are a combination of dotplot and boxplot, which I find weird. Since \Biocpkg{ReportingTools} is easily extensible, we can make changes that are more pleasing.

d.f <- topTable(fit2, 2)
out <- makeImages(df = d.f, eset = eset, grp.factor = pd$type, design = design,
                  contrast = contrast, colind = 1, repdir = ".")
htab <- HTMLReport("afile3", "My cool results, affycoretools style")
publish(out$df, htab, .mofifyDF = list(entrezLinks, affyLinks))
finish(htab)

Note that there are two differences in the way we did things. First, we create a \Robject{data.frame}, and then decorate it with the plots using the \Rfunction{makeImages} function. This will by default create dotplots (or you can specify boxplots). For the plots to fit in an HTML table, there are no axis labels. However, each plot is also a link, and if you click on it, a larger plot with axis labels will be presented. If you are running the code yourself, see 'afile3.html'.

All the little files that get created can get pretty messy, so the default is to put everything into a 'reports' subdirectory, so your working directory stays clean. For this example we over-ride the defaults so we do not have to go searching in subdirectories for our tables.

For HTML output the better idea is to generate a table using e.g., the xtable package, with links to the tables that we have generated.

An alternative parameterization that probably makes more sense is to fit coefficients for each sex/treatment combination.

grps <- factor(apply(pd[,1:2], 1, paste, collapse = "_"))
design <- model.matrix(~0+grps)
colnames(design) <- gsub("grps", "", colnames(design))
contrast <- matrix(c(1,-1,0,0,
                     0,0,1,-1,
                     1,-1,-1,1),
                   ncol = 3)
colnames(contrast) <- c("Female_Case vs Female_Control",
                        "Male_Case vs Male_Control",
                        "Interaction")
fit <- lmFit(eset, design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit2)

With this parameterization we can look at intra-sex differences, as well as the interaction (looking for sex-specific changes). This now means that we have a total of three HTML tables to output, which makes things a bit more complex to present. Luckily, this is pretty simple to accomplish. For this step we will now use the default 'reports' subdirectory to keep everything straight. In addition, we will trim down the output a bit.

## get a list containing the output for each comparison
out <- lapply(1:3, function(x) topTable(fit2, x))
## process the output to add images
htab <- lapply(1:3, function(x){
    tmp <- HTMLReport(gsub("_", " ", colnames(contrast)[x]), colnames(contrast)[x], "./reports")
    tmp2 <- makeImages(out[[x]], eset, grps, design, contrast, x)
    publish(tmp2$df, tmp, .modifyDF = list(affyLinks, entrezLinks))
    finish(tmp)
    return(tmp)
})

A reasonable thing to do would be to add a table to this document that lists all the different comparisons we made, with a count of the genes and links to the HTML tables we have generated. To do this we use the xtable package (you could also use \Rfunction{kable} as well), and add results = "asis" to the R code chunk in our Rmarkdown document.

d.f <- data.frame(Comparison =  sapply(htab, function(x) XML::saveXML(Link(x))),
                  "Number significant" = sapply(out, nrow), check.names = FALSE)
kable(d.f, caption = "Number of significant genes.",
      format = "html", row.names = FALSE)

We are often asked to create a Venn diagram showing overlap between groups. This is pretty simple to do, but it would be nicer to have an HTML version with clickable links, so your PI or end user can see what genes are in each cell of the Venn diagram. As an example, we can generate a Venn diagram comparing overlapping genes between the male and female comparisons.

collist <- list(1:2)
venn <- makeVenn(fit2, contrast, design, collist = collist, adj.meth = "none")
## generate a list of captions for each Venn.
## we only have one, so it's a list of length 1.
## we are using the BiocStyle package to control formatting,
## so the first sentence will be bolded and in blue.
cap <- list(paste("A Venn diagram. Note that the first sentence is bolded",
                  "and blue, whereas the rest is normal type and black. Usually",
                  "one would use a more useful caption than this one."))
vennInLine(venn, cap)

There is similar functionality for presenting the results of a GO hypergeometric analysis (\Rfunction{makeGoTable}), and GSEA analysis, based on the \Rfunction{romer} function in \Biocpkg{limma} (\Rfunction{runRomer} and \Rfunction{outputRomer}).

Session information

The version of R and packages loaded when creating this vignette were:

sessionInfo()


Try the affycoretools package in your browser

Any scripts or data that you put into this service are public.

affycoretools documentation built on Nov. 8, 2020, 6 p.m.