Introduction

Welcome to ezlimmaplot. This package is for plotting bioinformatics results, especially those from the ezlimma package. Here, I'll load both packages, create some output with ezlimma, and then plot it with ezlimmaplot.

Install & load package

See README.md.

library(limma)
library(ezlimma)
library(ezlimmaplot)

Simulate data

To demonstrate the package on a dataset, I'll simulate log2 expression data with 100 genes and 6 samples in two groups, with the first gene up-regulated in the first group. This is the example data from limma contrasts.fit. I add some phenotype variables, including the non-standard column name covar num.

For statistical analysis, we treat this data as being already processed. For many datasets, there are zeroes or missing values that might need to be imputed; samples need to be normalized to make them comparable and amenable to statistical tests; absent genes need to be removed; sample outliers need to be assessed to examine whether some experimental variables or batch effects need to be accounted for, or the samples need to be removed or down-weighted; and trends between a gene's mean expression and its variance should be accounted for, especially in RNA-seq data, with limma's voom function.

set.seed(42)
M <- matrix(rnorm(100*6, sd=0.3), nrow=100, ncol=6)
dimnames(M) <- list(paste0("gene", 1:nrow(M)), paste0("sample", 1:ncol(M)))
grp <- rep(c("First3", "Last3"), each=3)
tissue <- rep(c("muscle", "liver"), times=3)
covar_num <- rnorm(n = ncol(M))
pheno <- data.frame(row.names = colnames(M), sample=colnames(M), grp, tissue, 
                    "covar num"=covar_num, covar2=covar_num,
                    stringsAsFactors = FALSE, check.names = FALSE)
M[1,1:3] <- M[1,1:3] + 2

Principal component analysis (PCA)

We can plot the samples by their first two principal components using the ggplot2 package.

ezpca(M, pheno, color="grp", name=NA)

We set name=NA to suppress plotting to a PNG file. We pass color="grp" via ... to ggplot2 aes_string, where "grp" can be any of colnames(pheno)}. We could similarly have used color="tissue". Passing color="covar num" is special, because ggplot2 expects standard variable names or expressions of them whereas "covar num" has a space, but we can do it with:

ezpca(M, pheno, color="`covar num`", name=NA)

We can also facet by a phenotype column using the facet argument, which takes a formula.

ezpca(M, pheno, color="grp", facet = ". ~ tissue", name=NA)

We can also plot covariates as colors with shape="grp" using multi_covar_pca to PDF. multi_covar_pca only accepts covariates whose name corresponds to a column in pheno. The PDF is named "covar_pca.pdf" by default.

multi_covar_pca(M, pheno, grp.var="grp", covars=c("tissue", "covar2"))

Gene level differential analysis

Assume we want to test if genes are changed in First3, in Last3, or in Last3 - First3. We can do this in two lines of code,

contr.v <- c(First3="First3", Last3="Last3", Last3vsFirst3="Last3-First3")
res <- limma_contrasts(M, grp=grp, contrast.v = contr.v)

We add a column of gene symbols to the top genes.

res.df <- data.frame(signif(res, 3), Gene.Symbol=NA)
res.df$Gene.Symbol[1:10] <- LETTERS[1:10]

This returns a data frame that looks like:

knitr::kable(res.df[1:10,])

Significance histogram

We can plot a histogram of the p-values and FDRs for all comparisons with:

signif_hist(res, name=NA) #not run

signif_hist plots four histograms per page, and this line will plot two histograms per comparison x 3 comparisons = 6 histograms. Thus, this takes more than one page, so it is not shown here.

Venn diagram

We may want to see which genes are up or down in each comparison, and how the significant genes (say p-value < 0.05) behave in different comparisons using a Venn diagram.

ezvenn(tab=res, p.cutoff = 0.05)

Dotplot and boxplot

We can plot the top gene by group in a dotplot or boxplot using ggplot2.

plot_by_grp(M[1,], pheno$grp, name=NA, type="dot")
plot_by_grp(M[1,], pheno$grp, name=NA, type="box")

We can plot multiple genes, such as the first 5 genes where each gene will be on its own page in a PDF, by passing object=M[1:5,].

Heatmap

We plot a heatmap of the expression of the first five genes across all samples using the pheatmap package. We assume their gene symbols are "A", "B", "C", "D", "E". ezheat requires a data frame, so we make a phenotype data frame with only one column for the group, and with rownames(pheno)==colnames(M), which pheatmap checks for.

sym <- c("A", "B", "C", "D", "E")
pheno.df <- pheno[,"grp",drop=F]
topgenes <- rownames(res.df)[1:5]
ezheat(M[topgenes,], labrows = res.df[topgenes, "Gene.Symbol"], pheno.df=pheno.df, name=NA)

We see a color bar for the phenotype groups and the strongest change in metabolite "A".

Volcano plots

We can create volcano plots, which have logFC on the x-axis and log(p-value) on the y-axis, for a comparison using ggplot2.

ezvolcano(tab=res.df, comparison = "First3", name=NA, lab.col="Gene.Symbol", ntop.sig = 1)

We can automatically create a PDF with volcano plots for all comparisons with:

multi_volcano(tab=res.df, lab.col="Gene.Symbol", ntop.sig = 1)
# remove files that were made during knitting
unlink("covar_pca.pdf")

System info

Here is the information on the system on which this document was compiled:

sessionInfo()


jdreyf/ezlimmaplot documentation built on Feb. 8, 2025, 2:25 a.m.