library(knitr) library(rmarkdown) knitr::opts_chunk$set(echo = TRUE)
When we use regression models for Gaussian multivariate data curated from high-throughput assays("omic" data), the response variables normally have their latent structure in the variance-covariance matrix and its inverse. The conventional approach gives assumptions of these structures in regression. For example, in repeated measure analysis of variance, the covariance matrix is given a fixed form of the structure. However, these assumptions are not known to be true unless there are prior information available. Accurate information of the variance covariance matrix and the precision matrix will achieve better estimation in the regression coefficients. In cases of response variables with only a small numbers of non-zero covariance terms in the variance-covariance matrix, they are equivalent to a sparse undirected network graph of which nodes represent variables and edges represent non-zero links between these variables.
Our method is to utilize the Gaussian Graphical Model (GGM) in estimating the structure of a sparse variance-covariance matrix and its inverse (precision matrix), in order to improve the estimation of the multivariate generalized least square regression.
The method firstly uses lasso penalization method "glasso", neighbour selection method or "enet" method to estimate the precision matrix of these response variables; and Secondly selects the covariance terms from the sample variance-covariance matrix based on an estimated graph structure of the precision matrix and a fine-tuning parameter. The fine tuning parameter is selected according to the primitive graph theory which bases on power of the adjacent matrix to infer the final structure of the variance-covariance matrix.
The sparsenetgls is a multivariate regression model given an estimated
precision matrix and variance-covariance matrix from the response variables.
It uses the sandwich estimator of the variance-covariance matrix of the
regression coefficients.
In the estimation for the precision matrix, there are four options provided in
the package sparsenetgls and they are "glasso", "lasso", "mb", and "elastic".
"glasso" used the graphical lasso method proposed by Yuan and Lin (2007),
and the block-wise coordinate descend algorithm by Friedman, Hastie, Hofling
and Tibshirani (2007).
"lasso" and "mb" use the linear regression with lasso penalization to provide
an asymmetric approximation which is proposed by Meinshausen and
Buhlmann (2006).
"enet" is also an asymmetric approximation using linear regressions but using
the combination of lasso penalization (l1) and ridge regularization(l2).
In the generalized least square regression, regression coefficients are
estimated from the multilevel formatted multivariate model. Response variables
are stacked into one univariate regression variable with an indicator to
identify their sampling units. The sandwich estimator of the variance is
calculated based on the given precision, and variance-covariance matrix.
There are different options for the selection of the regression coefficients
from the penalized path. One is to base on the minimal variance, the other
options are the information critiria (AIC and BIC). The variances of the
selected regression coefficients are calculated from its sandwich estimator
given the selected structure of the precision matrix and the selected
covariance terms based on the fine-tuning parameter.
The main functions included in the package "sparsenetgls" are:
path_result_for_roc(): the path_result_for_roc function is designed to produce the diagnosis index for the prediction accuracy of a Gaussian Graphical model (GGM) when comparing it to the true graph structure. The GGM must use a L-p norm regularizations (p=1,2) with the series of solutions conditional on theregularization parameter. The returning list of assessment results for a series of precision matrices includes sensitiviy/specificity/Nagative predicitedvalue /Positive predicted value.
plot_roc(): it is designed to produce the Receiver Operative Characteristics (ROC) Curve to visualize the prediction accuracy of a Gaussian Graphical model (GGM) to the true graph structure. The GGM must use a L-p norm regularizations (p=1,2) with the series of solutions conditional on the regularization parameter.
sparsenetgls(): it is designed for multivariate regression given a penalized and/or regularised precision and covariance Matrix of the multivariate Gaussian distributed responses. Generalized least squared regression is used to derive the sandwich variances-covariance matrix for the regression coefficients of the explanatory variables, conditional on the solutions of the precision and covariance matrix.
plotsngls(): It is designed to provide the line plots of penalized parameter lambda and variance of regression coefficients in gls regression. It also provides the graph structures of the solutions to the precision matrix in the penalized path.
There are two options for the installation:
install.packages("BiocManager") BiocManaged::install("sparsenetgls") library(sparsenetgls)
devtools::install_github("superOmics/sparsenetgls") library(sparsenetgls)
The following section provides examples using a known precision matrix stored in R datafile: bandprec.rdata. The examples include:
1) To assess the results from Gaussian graphical model given by sparsenetgls;
2) Use sparsenetgls to estimate the regression coefficients from a multivariate regression and use the minimal variance , information criteria to select the regression coefficients;
3) Visualize the results from sparsenetgls;
4) Use different options: "lasso", "mb", and "elastic" for the GGM in gls.
Example 1 includes codes for assessing the results of a GGM from the sparsenetgls function. The first 12 lines is to simulate a dataset of response variables given the known precision matrix and a set of explanatory variables. The returning list of the sparsenetgls function includes the series of precision matrix produced from one of the GGM option (specified by method). In plot_roc , both of the ngroup and group option indicate if the assessed results are for a group of GGM or for only one series of GGM.
library(MASS) library(Matrix) library(sparsenetgls) #simulate the dataset data(bandprec, package="sparsenetgls") varKnown <- solve(as.matrix(bandprec)) prec <- as.matrix(bandprec) Y0 <- mvrnorm(n=100, mu=rep(0,50), Sigma=varKnown) nlambda=10 #u-beta u <- rep(1,8) X_1 <- mvrnorm(n=100, mu=rep(0,8), Sigma=Diagonal(8,rep(1,8))) Y_1 <- Y0+as.vector(X_1%*%as.matrix(u)) databand_X=X_1 databand_Y=Y_1 #produce the precision matrices omega <- sparsenetgls(responsedata=databand_Y, predictdata=databand_X, nlambda=10, ndist=1, method="glasso")$PREC_seq omega_est <- array(dim=c(50,50,10)) for (i in seq_len(10)) omega_est[,,i] <- as.matrix(omega[[i]]) roc_path_result <- path_result_for_roc(PREC_for_graph=prec, OMEGA_path=omega_est , pathnumber=10) plot_roc(result_assessment=roc_path_result, group=FALSE, ngroup=0, est_names="glasso estimation")
Example 2 provides codes for deriving the regression coefficients from sparsenetgls using "glasso" method to estimate the precision matrix of GGM. The function convertbeta() is to convert the regression coefficients from the standardized scale to the original scale. The following codes after the convertion is to select betas with the minimal variance, aic or bic.
fitgls <- sparsenetgls(databand_Y, databand_X, nlambda=10, ndist=5, method="glasso") #Convert the regression coefficients to its original scale q <- dim(databand_X)[2] nlambda=10 betagls <- matrix(nrow=nlambda, ncol=q+1) for (i in seq_len(nlambda)) betagls[i,] <- convertbeta(Y=databand_Y, X=databand_X, q=q+1, beta0=fitgls$beta[,i])$betaconv #Beta selection #select lamda and dist value based on the minimal variance of beta ndist <- max(fitgls$power)-1 tr_gamma <- matrix(nrow=10, ncol=ndist-1) for (j in seq_len(ndist-1)) for (i in seq_len(nlambda)) tr_gamma[i,j] <- (sum(diag(fitgls$covBeta[,,j,i]))) select.lambda.dist <- which(tr_gamma==min(tr_gamma), arr.ind=TRUE) select.lambda.dist betagls_select <- betagls[select.lambda.dist[1],] #row is lambda and column is dist varbeta <- diag(fitgls$covBeta[,,ndist,select.lambda.dist[1]]) ##select lamda and dist value based on the AIC and BIC select.lambda.dist2 <- which(fitgls$bic==min(fitgls$bic,na.rm=TRUE), arr.ind=TRUE) select.lambda.dist3 <- which(fitgls$aic==min(fitgls$aic,na.rm=TRUE), arr.ind=TRUE) varbeta_bic <- diag(fitgls$covBeta[,,ndist,select.lambda.dist2[1]]) varbeta_aic <- diag(fitgls$covBeta[,,ndist,select.lambda.dist3[1]])
Example 3 is to visualize the results by line-plots and structure-plots for the derived precision matrix from the sparsenet object.
plotsngls(fitgls,ith_lambda=5)
Example 4 provides examples for using different options in estimating GGM in sparsenetgls.
#Use the glasso method to estimate the precision matrix fitgls_g <- sparsenetgls(databand_Y, databand_X, nlambda=10, ndist=5, method="elastic") #Uset the lasso method to approximate the precision matrix #fitgls_l <- sparsenetgls(databand_Y, databand_X, nlambda=10, ndist=5, #method="lasso") #use the Meinshausen B?hlmann method to approximate the precision matrix #fitgls_m <- sparsenetgls(databand_Y, databand_X, nlambda=10, ndist=5, #method="mb")
All of the outputs in this vignette are produced under the following conditions:
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.