Welcome to ezlimma
. This package is a wrapper for the commonly used limma
package, which makes limma
easier to use and offers some functionality that adds value to limma
's functions, such as mediation analysis and allowing for writing pathway results to Excel workbooks that link to the feature-level (e.g. gene-level) results. It exposes some of limma
's parameters, such as those in lmFit
; when the parameters are not exposed, limma
's default is used.
See README.md.
library(limma) library(ezlimma)
To demonstrate the package on a dataset, I'll simulate log2 expression data with 100 genes and 6 samples in two groups with a continuous covariate. The covariate and the first gene will be associated with group. This is similar to the example data from limma
contrasts.fit
.
We treat this data as being already processed. For many datasets, there are zeroes or missing values that might need to be imputed; samples need to be normalized to make them comparable and amenable to statistical tests; absent genes need to be removed; sample outliers need to be assessed to examine whether some experimental variables or batch effects need to be accounted for, or the samples need to be removed or down-weighted; and trends between a gene's mean expression and its variance should be accounted for, especially in RNA-seq data, with limma
's voom
function.
set.seed(42) M <- matrix(rnorm(100*6, sd=0.3), nrow=100, ncol=6) dimnames(M) <- list(paste0("gene", 1:nrow(M)), paste0("sample", 1:ncol(M))) M[1,1:3] <- M[1,1:3] + 2 pheno <- data.frame(grp=rep(c("First3", "Last3"), each=3), yy=rnorm(ncol(M)), stringsAsFactors = FALSE) pheno$yy[1:3] <- pheno$yy[1:3]-2
Assume we want to test if genes are changed in First3
, in Last3
, or in Last3 - First3
. We can do this in two lines of code,
contr.v <- c(First3="First3", Last3="Last3", Last3vsFirst3="Last3-First3") res0 <- limma_contrasts(M, grp=pheno$grp, contrast.v = contr.v)
First3
tests if the mean expression of genes in these samples differs from zero, and Last3vsFirst3
tests difference between groups.
This returns a data frame that looks like:
knitr::kable(signif(res0[1:10,], 3))
We can test association to the continuous covariate with:
res1 <- ezcor(M, pheno$yy, method="pearson")
where method
can be any of "pearson"
, "kendall"
, or "spearman"
. This returns a data frame like,
knitr::kable(signif(res1[1:10,], 3))
Similarly, we can test association using linear regression with limma
, where the model is gene expression = a + bx + error, and we test coefficient 2, i.e. the null hypothesis that b=0. So it returns slopes rather than correlation coefficients. The code is,
res2 <- limma_cor(M, phenotype = pheno$yy)
The output is,
knitr::kable(signif(res2[1:10,], 3))
Because this is a regression, we can test for association to yy
while adjusting for grp
. We do this by creating our own design matrix instead of letting limma_cor
create one from phenotype
.
design <- model.matrix(~1+yy+grp, data=pheno) res3 <- limma_cor(M, design = design)
which gives
knitr::kable(signif(res3[1:10,], 3))
We can also test multiple phenotypes together. Say we had a matrix of phenotypes as
pheno.mat <- cbind(pheno1=pheno$yy, pheno2=rnorm(ncol(M)))
then we could test both phenotypes together in one line, as
res4 <- multi_cor(object=M, pheno.tab = pheno.mat, reorder.rows = TRUE)
knitr::kable(signif(res4[1:10,], 3))
where the result is ordered by the combined p-value of the two associations per gene. The method
options for the association test are: "pearson"
(the default), "spearman"
, "kendall"
, as before, or "limma"
, where it will use limma_cor
. This function only allows one method for all phenotypes.
At a higher level than features (e.g. genes) lie feature sets, such as gene sets or pathways. These can be tested with limma
's roast
functions, which are implemented for multiple sets at a time with the functions mroast
or its faster approximation fry
. We have written wrappers around these to allow for easier testing, and to make an Excel file where each gene set links to a CSV containing the statistics on its genes, made using the feature-level functions.
Let's make the gene set object,
G <- list(list(name="pwy1", description=NA, genes=paste0("gene", 1:10)), list(name="pwy2", description=NA, genes=paste0("gene", 11:20)), list(name="pwy3", description=NA, genes=paste0("gene", 21:30)))
This object would normally be read in using read_gmt
. Then we can test each of these gene sets.
res5 <- roast_contrasts(object=M, G=G, feat.tab=res0, grp=pheno$grp, contrast.v = contr.v, fun="fry", name=NA)
If we set name
to a value other than NA
, this would create a folder with one CSV per pathway, and an Excel file that looks like
knitr::kable(res5)
except the first column would have links to the CSVs in the folder. Similarly, we can use roast_cor
. We can read these Excel files back into R with read_linked_xl
.
utils::sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.