knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This 1st version of the DEswan vignette briefly describes how to use this package to analyze the links between a quantitative trait (qt) and high dimensional datasets such as -omics data. Here we apply DEswan to the data generated by Lehallier et al. (https://www.biorxiv.org/content/10.1101/751115v1).
# library(devtools) # devtools::install_github("lehallib/DEswan",build_vignettes = T)
This aging proteomic dataset was generated using SOMAscan. Lehallier et al. measured 1305 plasma proteins from 171 individuals ranged from young adults to centenarians from 4 different cohorts.
# devtools::install("../DEswan") library("DEswan") head(agingplasmaproteome[,1:5])
The first three column are demographics and samples' characteristics. Columns 4:1308 are levels of plasma proteins in a log10 scale.
Age is the qt of interest and Sex and Cohort are covariates we want to include in the analysis
# distribution of qt hist(agingplasmaproteome[,1],main="qt distribution",xlab="Age (years)") # covariates table(agingplasmaproteome[,2]) table(agingplasmaproteome[,3])
To run DEswan we call the function DEswan(). DEswan() takes 5 arguments (3 are optional):
⋅⋅* data.df: a data Frame - columns are variables and rows are samples (same order as qt and covariates)
⋅⋅* qt: a Vector - quantitative trait for DE-SWAN (same order as data.df and covariates)
⋅⋅* window.center: an optional Vector - window centers. Default is quantile(qt, probs = seq(.1,.9,.1))
⋅⋅* buckets.size: an optional Numeric - size of each bucket. Default is set to (max(range(qt))-min(range(qt)))/2
⋅⋅* covariates: a data Frame. Columns are variables and rows are samples (same order as qt and covariates)
x=cor(agingplasmaproteome[,1],agingplasmaproteome[,-c(1:3)]) head(agingplasmaproteome[,1:5]) start_time <- Sys.time() # res.DEswan=DEswan(data.df = agingplasmaproteome[,-c(1:3)], res.DEswan=DEswan(data.df = agingplasmaproteome[,3+c(which(colnames(x) %in% colnames(x)[abs(x)>.5]))], qt = agingplasmaproteome[,1], window.center = seq(35,95,10), buckets.size = 20, covariates = agingplasmaproteome[,c(2:3)]) end_time <- Sys.time() end_time-start_time
The output of is a list of 2 data.frames: p = pvalues of qT and covariates for each variable and each window.center coefficients = coefficients of qT and covariates for each variable and each window.center
head(res.DEswan$p) head(res.DEswan$coeff)
Increasing the resolution of DEswan (more values in "window.center" vector) will provide more detailed results but will require additional computational time. Computational time also depends on the number of features included in the analysis.
Next releases of DEswan will focus on decreasing computational time.
DEswan outputs are in long format. Reshaping the results to wide format helps the exploration of the statistics produces by DEswan
res.DEswan.wide.p=reshape.DEswan(res.DEswan,parameter = 1,factor = "qt") head(res.DEswan.wide.p[,1:5])
Methods for dealing with multiple testing frequently call for adjusting α in some way. The Bonferroni correction is one simple way to take this into account; adjusting the false discovery rate using the Benjamini-Hochberg procedure is a more powerful method.
res.DEswan.wide.q=q.DEswan(res.DEswan.wide.p,method="BH") head(res.DEswan.wide.q[,1:5])
One simple way to visualize DEswan results is to count the number of significant variables for each window.center tested.
# head(res.DEswan.wide.p[,1:5]) res.DEswan.wide.p.signif=nsignif.DEswan(res.DEswan.wide.p) toPlot=res.DEswan.wide.p.signif[1:3,] x=as.numeric(gsub("X","",colnames(toPlot))) plot(1, type = "n", xlim=c(min(x,na.rm=T),max(x,na.rm=T)),ylim=c(0,max(toPlot,na.rm=T)),ylab="# significant",xlab="qt") for(i in 1:nrow(toPlot)){ lines(x, toPlot[i,],type='l',lwd=i) } legend("topleft",legend = paste("p<",rownames(toPlot),sep=""),lwd=c(1,2,3)) # head(res.DEswan.wide.q[,1:5]) res.DEswan.wide.q.signif=nsignif.DEswan(res.DEswan.wide.q) toPlot=res.DEswan.wide.q.signif[1:3,] x=as.numeric(gsub("X","",colnames(toPlot))) plot(1, type = "n", xlim=c(min(x,na.rm=T),max(x,na.rm=T)),ylim=c(0,max(toPlot,na.rm=T)),ylab="# significant",xlab="qt") for(i in 1:nrow(toPlot)){ lines(x, toPlot[i,],type='l',lwd=i) } legend("topleft",legend = paste("q<",rownames(toPlot),sep=""),lwd=c(1,2,3))
The best way to identify when and how the features are changing in the qt space is to generate a heatmap of the signed effects.
res.DEswan.wide.coeff=reshape.DEswan(res.DEswan,parameter = 2,factor = "qt") toHeatmap=sign(res.DEswan.wide.coeff[,-1])*-log10(res.DEswan.wide.p[,-1]) rownames(toHeatmap)<-res.DEswan.wide.coeff$variable pairs.breaks <- seq(-3, 3, by=0.01) mycol <- gplots::colorpanel(n=length(pairs.breaks)-1,low="cyan",mid="black",high="yellow") # display the colorbar # image(z = matrix(1:100, ncol = 1),col = mycol,xaxt = "n",yaxt = "n") gplots::heatmap.2(as.matrix(toHeatmap), cexRow=.1,cexCol=.7, trace="none", dendrogram="row", breaks=pairs.breaks, col=mycol, Rowv=T,key=F, Colv=F, lhei=c(0.2,10), lwid=c(.2,3) )
Similar analysis can be generated for each covariates. Here we show how sex affect the plasma proteome during aging
res.DEswan.wide.p.covar1=reshape.DEswan(res.DEswan,parameter = 1,factor = "Sex") res.DEswan.wide.coeff.covar1=reshape.DEswan(res.DEswan,parameter = 2,factor = "SexMale") toHeatmap=sign(res.DEswan.wide.coeff[,-1])*-log10(res.DEswan.wide.p[,-1]) rownames(toHeatmap)<-res.DEswan.wide.coeff$variable pairs.breaks <- seq(-3, 3, by=0.01) mycol <- gplots::colorpanel(n=length(pairs.breaks)-1,low="cyan",mid="black",high="yellow") # display the colorbar # image(z = matrix(1:100, ncol = 1),col = mycol,xaxt = "n",yaxt = "n") gplots::heatmap.2(as.matrix(toHeatmap), cexRow=.1,cexCol=.7, trace="none", dendrogram="both", breaks=pairs.breaks, col=mycol, Rowv=T,key=F, Colv=F, lhei=c(0.2,10), lwid=c(.2,3) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.