knitr::opts_chunk$set(message = FALSE, warning = FALSE)
Pando provides various regression models and modeling options that can be used for GRN inference. This vignette gives an overview of the available options. First, let's load the object.
library(Pando) library(tidyverse) library(doParallel) registerDoParallel(4) muo_data <- read_rds('muo_data.rds')
library(Pando) library(tidyverse) library(doParallel) registerDoParallel(4) muo_data <- read_rds('~/Dropbox/projects/Pando/data/nepi_test.rds')
The default option when running infer_grn()
is a generalized linear model (GLM) with gaussian noise. Using the family
argument, one can choose other noise models, e.g to fit directly on counts instead of on log-normalized data.
muo_data <- infer_grn( muo_data, parallel = T, genes = c('OTX2', 'SFRP2') ) coef(muo_data)
\ The coefficients of the models are tested using a t-test and modules are extracted by applying a significance threshold on the p-value.
In regularized linear models, the coefficients can be penalized so that they are pushed towards 0. In this way, only 'strong' connections are maintained. Here we use the glmnet
implementation
muo_data <- infer_grn( muo_data, method = 'cv.glmnet', parallel = T, genes = c('OTX2', 'SFRP2') ) coef(muo_data)
\
You might notice that this time there are no p-values here, but a lot of the coefficients (estimate
) are 0. In this case, modules will be extracted not by p-value, but by selecting non-zero coefficients. The alpha
argument can be used to adjust the elasticnet mixing parameter. 1 amounts to a lasso penalty and 0 to the ridge penalty. Lasso models are more sparse and will push more coefficients to zero.
CellOracle, another method for GRN inference, uses Bagging ridge and Bayesian ridge regression models from sklearn (python). We have used reticulate
to interact with python and implement these models also here. You do have to install scikit-learn in python for it, though.
muo_data <- infer_grn( muo_data, method = 'bagging_ridge', parallel = T, genes = c('OTX2', 'SFRP2') ) coef(muo_data)
\
As with the regular glm
, modules will be extracted based on p-value.
XGBoost is yet another popular method that is used by SCENIC. It is not based on linear regression but uses gradient-boosted Random Forest regression to model non-linear relationships.
muo_data <- infer_grn( muo_data, method = 'xgb', parallel = T, genes = c('OTX2', 'SFRP2') ) coef(muo_data)
\
Here we get neither a 'normal' coefficient, nor a p-value, but instead 3 different importance values: gain
, cover
, and frequency
. These indicate the importance of the variable to the regressor. To extract modules one can select the top target genes for each TF based on the gain
value. Alternatively, one can select the top TFs for each target gene.
Finally, we implemented to option to use Bayesian regression models with brms and Stan.
muo_data <- infer_grn( muo_data, method = 'brms', parallel = T, genes = c('OTX2', 'SFRP2') )
However, these usually have very long runtimes and are only feasible on very small GRNs.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.