In quadbiolab/Pando: Inferring regulomes from multi-modal single-cell genomics data

knitr::opts_chunk$set(message = FALSE, warning = FALSE)

Pando provides various regression models and modeling options that can be used for GRN inference. This vignette gives an overview of the available options. First, let's load the object.

library(Pando)
library(tidyverse)
library(doParallel)
registerDoParallel(4)
muo_data <- read_rds('muo_data.rds')

library(Pando)
library(tidyverse)
library(doParallel)
registerDoParallel(4)
muo_data <- read_rds('~/Dropbox/projects/Pando/data/nepi_test.rds')

Generalized linear model

The default option when running infer_grn() is a generalized linear model (GLM) with gaussian noise. Using the family argument, one can choose other noise models, e.g to fit directly on counts instead of on log-normalized data.

muo_data <- infer_grn(
    muo_data,
    parallel = T,
    genes = c('OTX2', 'SFRP2')
) 
coef(muo_data)

\ The coefficients of the models are tested using a t-test and modules are extracted by applying a significance threshold on the p-value.

Regularized linear model

In regularized linear models, the coefficients can be penalized so that they are pushed towards 0. In this way, only 'strong' connections are maintained. Here we use the glmnet implementation

muo_data <- infer_grn(
    muo_data,
    method = 'cv.glmnet',
    parallel = T,
    genes = c('OTX2', 'SFRP2')
) 
coef(muo_data)

\ You might notice that this time there are no p-values here, but a lot of the coefficients (estimate) are 0. In this case, modules will be extracted not by p-value, but by selecting non-zero coefficients. The alpha argument can be used to adjust the elasticnet mixing parameter. 1 amounts to a lasso penalty and 0 to the ridge penalty. Lasso models are more sparse and will push more coefficients to zero.

Bagging ridge & Bayesian ridge

CellOracle, another method for GRN inference, uses Bagging ridge and Bayesian ridge regression models from sklearn (python). We have used reticulate to interact with python and implement these models also here. You do have to install scikit-learn in python for it, though.

muo_data <- infer_grn(
    muo_data,
    method = 'bagging_ridge',
    parallel = T,
    genes = c('OTX2', 'SFRP2')
) 
coef(muo_data)

\ As with the regular glm, modules will be extracted based on p-value.

XGBoost

XGBoost is yet another popular method that is used by SCENIC. It is not based on linear regression but uses gradient-boosted Random Forest regression to model non-linear relationships.

muo_data <- infer_grn(
    muo_data,
    method = 'xgb',
    parallel = T,
    genes = c('OTX2', 'SFRP2')
) 
coef(muo_data)

\ Here we get neither a 'normal' coefficient, nor a p-value, but instead 3 different importance values: gain, cover, and frequency. These indicate the importance of the variable to the regressor. To extract modules one can select the top target genes for each TF based on the gain value. Alternatively, one can select the top TFs for each target gene.

Brms

Finally, we implemented to option to use Bayesian regression models with brms and Stan.

muo_data <- infer_grn(
    muo_data,
    method = 'brms',
    parallel = T,
    genes = c('OTX2', 'SFRP2')
)

However, these usually have very long runtimes and are only feasible on very small GRNs.

quadbiolab/Pando documentation built on April 22, 2024, 8:14 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

quadbiolab/Pando
Inferring regulomes from multi-modal single-cell genomics data

In quadbiolab/Pando: Inferring regulomes from multi-modal single-cell genomics data

Generalized linear model

Regularized linear model

Bagging ridge & Bayesian ridge

XGBoost

Brms

R Package Documentation

Browse R Packages

We want your feedback!

quadbiolab/Pando Inferring regulomes from multi-modal single-cell genomics data

In quadbiolab/Pando: Inferring regulomes from multi-modal single-cell genomics data

Generalized linear model

Regularized linear model

Bagging ridge & Bayesian ridge

XGBoost

Brms

R Package Documentation

Browse R Packages

We want your feedback!

quadbiolab/Pando
Inferring regulomes from multi-modal single-cell genomics data