Causal network inference and discovery with Structural Equation Modeling Network Analysis and Causal Learning with Structural Equation Modeling
SEMgraph Estimate networks and causal relations in complex systems through Structural Equation Modeling (SEM). SEMgraph comes with the following functionalities: - Interchangeable model representation as either an igraph object or the corresponding SEM in lavaan syntax. Model management functions include graph-to-SEM conversion, automated covariance matrix regularization, graph conversion to DAG, and tree (arborescence) from correlation matrices. - Heuristic filtering, node and edge weighting, resampling and parallelization settings for fast fitting in case of very large models. - Automated data-driven model building and improvement, through causal structure learning and bow-free interaction search and latent variable confounding adjustment. - Perturbed paths finding, community searching and sample scoring, together with graph plotting utilities, tracing model architecture modifications and perturbation (i.e., activation or repression) routes.
The latest stable version can be installed from CRAN:
install.packages("SEMgraph")
The latest development version can be installed from GitHub:
devtools::install_github("fernandoPalluzzi/SEMgraph")
Do not forget to install the SEMdata package too! It contains useful high-throughput sequencing data, reference networks, and pathways for SEMgraph training:
devtools::install_github("fernandoPalluzzi/SEMdata")
A gentle introduction to SEMgraph functionalities is available at our DOCs page.
The full list of SEMgraph functions with examples is available at our website HERE.
SEMgraph and SEMdata reference datasets are freezed to benchmarked versions. If you would like to get the latest version of your favourite database, you can use either the R package graphite (graphite tutorial), or our simple wrapper function, contained in the R script loadPathwayData.R. The script comes with descriptions and examples.
SEMgraph offers several verified datasets to work with, for both training and research. They include (** available with the SEMdata expansion): - KEGG directed reference network of 5934 nodes and 77158 edges, derived from the KEGG database. - KEGG pathways. A comprehensive list of 227 KEGG pathways (last update: February 2024). - Reactome directed reference network of 9762 nodes and 416128 edges, derived from Reactome DB. [**] - Reactome pathways. A comprehensive list of 1641 pathways (last update: April 2020). [**] - STRING interactome (version 10.5) of 9725 nodes and 170987 edges. [**] - Amyotrophic Lateral Sclerosis (ALS) RNA-seq dataset of 139 cases and 21 healthy controls, from Tam O.H. et al., 2019 (GEO accession: GSE124439). [**] - Frontotemporal Dementia (FTD) DNA methylation dataset 150 cases and 150 healthy controls, from Li Y. et al., 2014 (GEO accession: GSE53740). [**] - COVID-19 RNA-seq dataset of 46 critical and 23 non-critical COVID-19 cases in young patients, from Carapito R. et al., 2022 (GEO accession: GSE172114). [**] - Flow cytometry data and causal model from Sachs et al., 2005.
Grassi M, Palluzzi F, Tarantino B. SEMgraph: an R package for causal network inference of high-throughput data with structural equation models. Bioinformatics, 2022 Aug 30; 38(20):btac567. https://doi.org/10.1093/bioinformatics/btac567
Grassi M, Tarantino B. SEMgsa: topology-based pathway enrichment analysis with structural equation models. BMC Bioinformatics, 2022 Aug 17; 23(1):344. https://doi.org/10.1186/s12859-022-04884-8
Grassi M, Tarantino B. SEMtree: tree-based structure learning methods with structural equation models. Bioinformatics, 2023 June 09; 39(6):btad377. https://doi.org/10.1093/bioinformatics/btad377
Grassi M, Tarantino B. SEMbap: Bow-free covariance search and data de-correlation. PLoS Comput Biol, 2024 Sep 11; 20(9):e1012448. https://doi.org/10.1371/journal.pcbi.1012448
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.