knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
HOmics is an R package to integrate omics data by capturing hierarchical relations between omics in their association to phenotype. It contains two main functions, the generic HOmics() function and HOmics.meth(), to analyze specifically methylation of genes affecting a phenotype. Details on how to apply this latter function are available in the HOmics methylation gene integration vignette.
HOmics fits Bayesian models through the package r CRANpkg("rjags")
, which needs the JAGS (Just Another Gibbs Sampler) environment to be installed. JAGS is a program for the analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation.
You can install JAGS easily through this link: http://mcmc-jags.sourceforge.net/
Once JAGS is installed you only need to install the HOmics package:
# library(devtools) # install_github("lnonell/HOmics") library(HOmics)
HOmics depends on the following R packages:
HOmics fits a Bayesian hierarchical model for each group of features specified. The steps to fit each model are the following:
r CRANpkg("rjags")
is used to fit the hierarchical model with the MCMC approach. These are the steps internally taken to fit the model and get the results:r CRANpkg("MCMCvis")
is used to obtain summariesConvergence has been tested in simulated data with these parameters and the three Markov chains were considerably stable.
As previously stated, HOmics fits a Bayesian hierarchical model for a group of features. For that the agg.matrix is needed.
Input data parameters for main function HOmics are the following:
Figure 1 summarizes the relationship among the different needed objects in the model hierarchy.
knitr::include_graphics("Fig1.jpg")
cond is the response variable and should be a vector with the same length and order as samples in data.matrix (specified as columns). Response variable can be continuous or categorical but depending on its nature it will be transformed to fit the hiearachical model.
HOmics will fit a hierarchical logistic regression model when the response is binary or a hierarchical linear regression model if the response is continuous.
JAGS needs a vector of 0s and 1s, so vector cond will be converted if it is not such. The rules to convert it are the following:
if cond is a factor, it will be converted to a vector of 0's and 1's. Remember when converting to factor in R, it assigns the levels in alphabetical order by default. So for instance, in the following code, for vector a, 2 will be taken as the level of reference. In this case, we would compare 4 vs 2. For vector b, HOmics will use "healthy" as the reference level and the comparison performed will be "tumor" vs "healthy". You can use function relevel() to select a different reference level.
if cond is a character variable, it will be converted to a numerical vector by converting it first to a factor.
This is the internal code used to transform the variable:
a <- factor(c(2,4,4,2,4)) levels(a) cond.a <- as.numeric(a)-1 cond.a b <- c("tumor","healthy","tumor","tumor","healthy","healthy") levels (b) cond.b <- as.numeric(b)-1 cond.b
When the response variable is continuous, a hierarchical linear model will be fitted.
The model accepts one or more covariates in parameter covar.matrix. covar.matrix can be a vector or a matrix but in both cases the values must be in the same sample order as cond and as the columns of data.matrix.
Parallelization is enabled with the use of packages r CRANpkg("doParallel")
and r CRANpkg("foreach")
, which should work in any operative system.
Parameter cores will enable parallelization in the functions. Use function detectCores() in r CRANpkg("parallel")
to retrieve the number of cores to work.
library(parallel) detectCores()
The output given by HOmics function is an object of class Homics. It contains a list of results for each analysis performed on each group of features.
For each element in the results object, a tibble, with the following columns is specified:
HOmics objects can be subsequently filtered with function get.signif() to get significative features or plotted with function plot.res().
Let's create some random but realistic data.
s.n <- 10 f.n <- 50 z.n <- 10 g.n <- 8 #data.matrix, features with random normal distribution data.matrix<-matrix(data = rnorm(500,5,3),nrow=f.n) rownames(data.matrix) <- paste0("f",1:f.n) colnames(data.matrix) <- paste0("s",1:s.n) #aggregation of features g a agg.matrix <- matrix(data = sample(rep(0:1,1000),400),nrow=g.n) rownames(agg.matrix) <- paste0("g",1:g.n) colnames(agg.matrix) <- paste0("f",1:f.n) #prior matrix of features belonging to z's z.matrix<-matrix(data = sample(rep(0:1,1000),500),nrow=f.n) rownames(z.matrix) <- paste0("f",1:f.n) colnames(z.matrix) <- paste0("z",1:z.n) #response y <- rep(c(0,1),5) cond <- as.factor(y)
Let's apply the function HOmics, which will fit a hierarchical logistic model. As there are 8 groups specified in the agg.matrix, HOmics will fit 8 hierarchical models and return a list with 8 elements with the results.
library(HOmics) res<-HOmics(data.matrix = data.matrix, agg.matrix = agg.matrix, cond = cond, z.matrix =z.matrix, covar.matrix = NULL, cores = 4) class(res)
Object res is an object of class HOmics, which contains a list of results for each group of features.
res[[1]]
We filter the results of those features with high probability of positive betas and also of negative betas.
res.f.pos <- get.signif(res, param = "p.pos", threshold = 0.95, as.data.frame = T) res.f.pos res.f.neg <- get.signif(res, param = "p.neg", threshold = 0.95, as.data.frame = T) res.f.neg
We finally plot 95% probability for the first model.
plot.res(res, element = "g1")
cond.cont <- rnorm(10,1,7) res2<-HOmics(data.matrix = data.matrix, agg.matrix = agg.matrix, cond = cond.cont, z.matrix = z.matrix, covar.matrix = NULL, cores = 4)
covar<- rep(c("A","B"),5) res3<-HOmics(data.matrix = data.matrix, agg.matrix = agg.matrix, cond = cond, z.matrix = z.matrix, covar.matrix = covar, cores = 4)
covariate <- rnorm(10,3,4) res4<-HOmics(data.matrix = data.matrix, agg.matrix = agg.matrix, cond = cond, z.matrix = z.matrix, covar.matrix = covariate, cores = 4)
co.matrix <- matrix(rnorm(20,3,4),nrow=10) res4<-HOmics(data.matrix = data.matrix, agg.matrix = agg.matrix, cond = cond, z.matrix = z.matrix, covar.matrix = co.matrix, cores = 4)
For this example we will work with data from three different sources:
r Biocpkg("curatedovarianData")
r CRANpkg("msigdbr")
With all these information we will perform a functional approach by analyzing each gene set in mSigDB related to ovarian cancer. For all genes in a gene set, we will fit a hierarchical model where prior information known about genes are their predicted miRNAs.
# BiocManager::install("curatedOvarianData") library(curatedOvarianData) data(TCGA_eset) TCGA_eset
Notice that TCGA_eset is an object of class ExpressionSet. We will compare ovarian cancer tumor samples in stage 4 with recurrence versus healthy samples. For that, we will subset the original ExpressionSet with the required constraints. Once subsetted the original TCGA_eset object, we will extract the expression data (with the function exprs()) and the phenotype variables (with the function pData()).
pheno <- pData(TCGA_eset) table(pheno$sample_type) TCGA_subset <- TCGA_eset[,pheno$sample_type=="healthy" | (pheno$sample_type=="tumor" & pheno$tumorstage==4 & pheno$recurrence_status=="recurrence")] expr.m <- exprs(TCGA_subset) dim(expr.m) pheno <- pData(TCGA_subset)
Let's obtain the condition vector.
cond <- pheno$sample_type cond
Notice that this is a character categorical vector.
The aggregated matrix must contain the groups that will be analyzed, in this case each functional gene set is a group and will be specified in a row. Information is obtained from the MSigDB through r CRANpkg("msigdbr")
and then the agg.matrix is constructed.
To get pathways in collection curated gene sets (C2) with keyword ovarian follow these steps:
# install.packages(msigdbr) library(msigdbr) c2 = msigdbr(species = "Homo sapiens", category = "C2") c2.ov <- c2[grep("OVARIAN", c2$gs_name),] c2.ov
And, by manipulating the results, we obtain the aggregated matrix.
agg.table <- table(c2.ov$gs_name, c2.ov$human_gene_symbol) agg.matrix <- as.matrix(as.data.frame.matrix(agg.table)) dim(agg.matrix) table(agg.matrix) # 0s and 1s only
14 gene sets where selected.
The prior matrix with miRNAs that target our set of genes is obtained from the downloaded information obtained at TargetScan.
# install.packages("readr") data("targets.hs.6.2", package="HOmics") targets
Let's filter the targets to get only those with probability of conserved targeting (PCT) > 0.5 and generate the prior information matrix, i.e. the z.matrix.
targets.f <- targets %>% mutate(PCT= as.numeric(PCT)) %>% filter(PCT >0.5) z.table <- table(targets.f$`Gene Symbol`, targets.f$`miR Family`) z.matrix <- as.matrix(as.data.frame.matrix(z.table)) z.matrix[z.matrix>1]<-1 table(z.matrix) # 0s and 1s only
To make sure that features (in this case genes) in the three matrices are the same, we do:
common.genes <- intersect(intersect(rownames(expr.m), colnames(agg.matrix)), rownames(z.matrix)) length(common.genes) agg.matrix <- agg.matrix[,common.genes] expr.m <- expr.m[common.genes,] z.matrix <- z.matrix[common.genes,]
And finally, we just need to decide the number of cores and apply function HOmics.
library(HOmics) res6 <- HOmics(data.matrix = expr.m, agg.matrix = agg.matrix, cond = cond, z.matrix = z.matrix, covar.matrix = NULL, cores = 6) class(res6) length(res6)
Note that 6 cores were used in this case.
Let's filter the results to obtain high probabiliy results (p > 0.95).
res6.f.pos <- get.signif(res6, param = "p.pos",threshold = 0.95, as.data.frame = T) res6.f.pos res6.f.neg <- get.signif(res6, param = "p.neg",threshold = 0.95, as.data.frame = T) res6.f.neg
And finally let's plot results for pathway BONOME_OVARIAN_CANCER_POOR_SURVIVAL_DN
plot.res(res6, element = "BONOME_OVARIAN_CANCER_POOR_SURVIVAL_DN")
HOmics contains a specific function to analyze methylation array data. Check vignette.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.