knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "vignettes/figures/README-",
  out.width = "100%"
)
options(tibble.print_min = 5, tibble.print_max = 5)

SmCCNet: A Comprehensive Tool for Multi-Omics Network Inference

CRAN status

Note: if you use SmCCNet in published research, please cite:

Liu, W., Vu, T., Konigsberg, I. R., Pratte, K. A., Zhuang, Y., & Kechris, K. J. (2023). SmCCNet 2.0: an Upgraded R package for Multi-omics Network Inference. bioRxiv, 2023-11.

Shi, W. J., Zhuang, Y., Russell, P. H., Hobbs, B. D., Parker, M. M., Castaldi, P. J., ... & Kechris, K. (2019). Unsupervised discovery of phenotype-specific multi-omics networks. Bioinformatics, 35(21), 4336-4343.

Overview

SmCCNet is a framework designed for integrating one or multiple types of omics data with a quantitative or binary phenotype. It's based on the concept of sparse multiple canonical analysis (SmCCA) and sparse partial least squared discriminant analysis (SPLSDA) and aims to find relationships between omics data and a specific phenotype. The framework uses LASSO (Least Absolute Shrinkage and Selection Operator) for sparsity constraints, allowing it to identify significant features within the data.

The algorithm has two modes: weighted and unweighted. In the weighted mode, it uses different scaling factors for each data type, while in the unweighted mode, all scaling factors are equal. The choice of mode affects how the data is analyzed and interpreted.

SmCCNet's workflow consists of four main steps:

Determine Sparsity Penalties: The user selects sparsity penalties for omics feature selection, either based on study needs, prior knowledge, or through a K-fold cross-validation procedure. This step ensures the selection of features is generalizable and avoids overfitting.

Subsample and Apply SmCCA: Omics features are randomly subsampled and analyzed using SmCCA with the chosen penalties. This process is repeated multiple times to create a feature relationship matrix, which is then averaged to form a similarity matrix.

Identify Multi-Omics Networks: The similarity matrix is analyzed using hierarchical tree cutting to identify multiple subnetworks that are relevant to the phenotype.

Prune and Summarize Networks: Finally, the identified networks are pruned and summarized using a network pruning algorithm, refining the results to highlight the most significant findings.

SmCCNet Key Features

There are three major computational algorithms that are used for difrerent number of datasets and phenotype modalities:

Unlock the Power of SmCCNet with These Key Features:

SmCCNet Network Visualization

The final network generated from SmCCNet can be visualized in two ways:

SmCCNet Workflow

General Workflow

knitr::include_graphics("vignettes/figures/smccnetworkflow.jpg")

Multi-Omics SmCCNet with Quantitative Phenotype

knitr::include_graphics("vignettes/figures/SmCCNet-Quant.jpg")

Multi-Omics SmCCNet with Binary Phenotype

knitr::include_graphics("vignettes/figures/SmCCNet-Binary.jpg")

Single-Omics SmCCNet

knitr::include_graphics("vignettes/figures/single-omics-smccnet.jpg")

SmCCNet Example Output Product

knitr::include_graphics("vignettes/figures/example_network_continuous.jpg")

Package Functions

The older version of the SmCCNet package includes four (external) functions:

In the updated package, all functions except for getAbar are retired from the package, additional functions have been added to the package to perform single-/multi-omics SmCCNet with quantitative/binary phenotype, and their use is illustrated in this vignette:

Installation

# Install package
if (!require("devtools")) install.packages("devtools")
devtools::install_github("KechrisLab/SmCCNet")
# Load package
library(SmCCNet)

Usage

We present below examples of how to execute Automated SmCCNet using a simulated dataset. In this demonstration, we simulate four datasets: two omics data and one phenotype data. We cover four cases in total, involving combinations of single or multi-omics data with either a quantitative or binary phenotype. The final case demonstrates the use of the regress-out approach for covariate adjustment. If users want to run through the pipeline step-by-step or understand more about the algorithm used, please refer to SmCCNet single or multi-omics vignettes for details.

library(SmCCNet)
set.seed(123)
data("ExampleData")
Y_binary <- ifelse(Y > quantile(Y, 0.5), 1, 0)
# single-omics with binary phenotype
result <- fastAutoSmCCNet(X = list(X1), Y = as.factor(Y_binary), 
                          Kfold = 3, 
                          subSampNum = 100, DataType = c('Gene'),
                          saving_dir = getwd(), EvalMethod = 'auc', 
                          summarization = 'NetSHy', 
                          CutHeight = 1 - 0.1^10, ncomp_pls = 5)
# single-omics with quantitative phenotype
result <- fastAutoSmCCNet(X = list(X1), Y = Y, Kfold = 3, 
                          preprocess = FALSE,
                          subSampNum = 50, DataType = c('Gene'),
                          saving_dir = getwd(), summarization = 'NetSHy',
                          CutHeight = 1 - 0.1^10)
# multi-omics with binary phenotype
result <- fastAutoSmCCNet(X = list(X1,X2), Y = as.factor(Y_binary), 
                          Kfold = 3, subSampNum = 50, 
                          DataType = c('Gene', 'miRNA'), 
                          CutHeight = 1 - 0.1^10,
                          saving_dir = getwd(), 
                          EvalMethod = 'auc', 
                          summarization = 'NetSHy',
                          BetweenShrinkage = 5, 
                          ncomp_pls = 3)
# multi-omics with quantitative phenotype
result <- fastAutoSmCCNet(X = list(X1,X2), Y = Y, 
                          K = 3, subSampNum = 50, 
                          DataType = c('Gene', 'miRNA'), 
                          CutHeight = 1 - 0.1^10,
                          saving_dir = getwd(),  
                          summarization = 'NetSHy',
                          BetweenShrinkage = 5)

Global network information will be stored in object 'result', and subnetwork information will be stored in the directory user provide. For more information about using Cytoscape to visualize the subnetworks, please refer to the multi-omics vignette section 3.1.

Getting help

If you encounter a bug, please file an issue with a reproducible example on GitHub. For questions and other discussion, please use community.rstudio.com.


This package is developed by KechrisLab, for more questions about the package, please contact Dr. Katerina Kechris or Weixuan Liu.



KechrisLab/SmCCNet documentation built on April 18, 2024, 9:46 p.m.