In jdreyf/ezlimma: Streamlines and extends limma package

Introduction

Welcome to ezlimma. This package is a wrapper for the commonly used limma package, which makes limma easier to use and offers some functionality that adds value to limma's functions, such as mediation analysis and allowing for writing pathway results to Excel workbooks that link to the feature-level (e.g. gene-level) results. It exposes some of limma's parameters, such as those in lmFit; when the parameters are not exposed, limma's default is used.

Install & load package

See README.md.

library(limma)
library(ezlimma)

Simulate data

To demonstrate the package on a dataset, I'll simulate log2 expression data with 100 genes and 6 samples in two groups with a continuous covariate. The covariate and the first gene will be associated with group. This is similar to the example data from limma contrasts.fit.

We treat this data as being already processed. For many datasets, there are zeroes or missing values that might need to be imputed; samples need to be normalized to make them comparable and amenable to statistical tests; absent genes need to be removed; sample outliers need to be assessed to examine whether some experimental variables or batch effects need to be accounted for, or the samples need to be removed or down-weighted; and trends between a gene's mean expression and its variance should be accounted for, especially in RNA-seq data, with limma's voom function.

set.seed(42)
M <- matrix(rnorm(100*6, sd=0.3), nrow=100, ncol=6)
dimnames(M) <- list(paste0("gene", 1:nrow(M)), paste0("sample", 1:ncol(M)))
M[1,1:3] <- M[1,1:3] + 2

pheno <- data.frame(grp=rep(c("First3", "Last3"), each=3), yy=rnorm(ncol(M)), stringsAsFactors = FALSE)
pheno$yy[1:3] <- pheno$yy[1:3]-2

Gene level analysis

Assume we want to test if genes are changed in First3, in Last3, or in Last3 - First3. We can do this in two lines of code,

contr.v <- c(First3="First3", Last3="Last3", Last3vsFirst3="Last3-First3")
res0 <- limma_contrasts(M, grp=pheno$grp, contrast.v = contr.v)

First3 tests if the mean expression of genes in these samples differs from zero, and Last3vsFirst3 tests difference between groups.

This returns a data frame that looks like:

knitr::kable(signif(res0[1:10,], 3))

We can test association to the continuous covariate with:

res1 <- ezcor(M, pheno$yy, method="pearson")

where method can be any of "pearson", "kendall", or "spearman". This returns a data frame like,

knitr::kable(signif(res1[1:10,], 3))

Similarly, we can test association using linear regression with limma, where the model is gene expression = a + bx + error, and we test coefficient 2, i.e. the null hypothesis that b=0. So it returns slopes rather than correlation coefficients. The code is,

res2 <- limma_cor(M, phenotype = pheno$yy)

The output is,

knitr::kable(signif(res2[1:10,], 3))

Because this is a regression, we can test for association to yy while adjusting for grp. We do this by creating our own design matrix instead of letting limma_cor create one from phenotype.

design <- model.matrix(~1+yy+grp, data=pheno)
res3 <- limma_cor(M, design = design)

which gives

knitr::kable(signif(res3[1:10,], 3))

We can also test multiple phenotypes together. Say we had a matrix of phenotypes as

pheno.mat <- cbind(pheno1=pheno$yy, pheno2=rnorm(ncol(M)))

then we could test both phenotypes together in one line, as

res4 <- multi_cor(object=M, pheno.tab = pheno.mat, reorder.rows = TRUE)

knitr::kable(signif(res4[1:10,], 3))

where the result is ordered by the combined p-value of the two associations per gene. The method options for the association test are: "pearson" (the default), "spearman", "kendall", as before, or "limma", where it will use limma_cor. This function only allows one method for all phenotypes.

Gene set level analysis

At a higher level than features (e.g. genes) lie feature sets, such as gene sets or pathways. These can be tested with limma's roast functions, which are implemented for multiple sets at a time with the functions mroast or its faster approximation fry. We have written wrappers around these to allow for easier testing, and to make an Excel file where each gene set links to a CSV containing the statistics on its genes, made using the feature-level functions.

Let's make the gene set object,

G <- list(list(name="pwy1", description=NA, genes=paste0("gene", 1:10)),
          list(name="pwy2", description=NA, genes=paste0("gene", 11:20)),
          list(name="pwy3", description=NA, genes=paste0("gene", 21:30)))

This object would normally be read in using read_gmt. Then we can test each of these gene sets.

res5 <- roast_contrasts(object=M, G=G, feat.tab=res0, grp=pheno$grp, contrast.v = contr.v, fun="fry", name=NA)

If we set name to a value other than NA, this would create a folder with one CSV per pathway, and an Excel file that looks like

knitr::kable(res5)

except the first column would have links to the CSVs in the folder. Similarly, we can use roast_cor. We can read these Excel files back into R with read_linked_xl.