knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figure/"
)

INDEED

Overview

This R package implements INDEED algorithm from Zuo et. al.'s Methods paper, INDEED: Integrated differential expression and differential network analysis of omic data for biomarker discovery (PMID: 27592383).

This R package will generate a list of dataframes containing information such as p-value, node degree and activity score for each biomolecule. A higher activity score indicates that the corresponding biomolecule has more neighbors connected in the differential network and their p-values are more statistically significant. It will also generate a network display to aid users' biomarker selection.

Installation

You can install INDEED from github with:

# install.packages("devtools")
devtools::install_github("ressomlab/INDEED")

Load package

Load the package.

# load INDEED
library(INDEED)

Examples

A testing dataset has been provided to the users to get familiar with INDEED R package. It contains the expression levels of 39 metabolites from 120 subjects (CIRR: 60; HCC: 60) with CIRR group named as group 0 and HCC group named as group 1.

# Data matrix contains the expression levels of 39 metabolites from 120 subjects 
# (6 metabolites and 10 subjects are shown)
head(Met_GU[, 1:10])
# Group label for each subject (40 subjects are shown)
Met_Group_GU[1:40]
# Metabolite KEGG IDs (10 metabolites are shown)
Met_name_GU[1:10]

An example to obtain the differential network using partial correlation analysis.

# set seed to avoid randomness
set.seed(100)
# Compute rho values to run graphical lasso
pre_data <- select_rho_partial(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, error_curve = TRUE)

From the error curve figure, users can choose the rho value based on the minimum rule (red vertical line), the one standard error rule (blue horizontal line) or their preferred value. INDEED provides users the option to adjust multiple testing effect in edge detection (fdr = TRUE). This will lead to a more sparse network in general. In this example, the network is too sparse. We decide to set fdr = FALSE for demonstration. It's a good idea to start by setting fdr = TRUE and later relax it to fdr = FALSE if the network is too sparse when working on a new dataset.

# Choose optimal rho values to compute activity scores and build the differential network
result <- partial_cor(data_list = pre_data, rho_group1 = 'min', rho_group2 = "min", p_val = pvalue_M_GU, permutation = 1000, permutation_thres = 0.05, fdr = FALSE)

Show the network display and users can interact with it.

# Show result 
head(result$activity_score)
head(result$diff_network)
# Show network
network_display(result = result, nodesize= 'Node_Degree', nodecolor= 'Activity_Score', edgewidth= FALSE, layout= 'nice')

An example to obtain the differential network using correlation analysis. When the partial correlation analysis returns a too sparse network even when the multiple testing correction is turned off (fdr = FALSE). It's better to try correlation analysis.

# set seed to avoid randomness
set.seed(100)
# Compute rho values to run graphical lasso
result <- non_partial_cor(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, method = "pearson", p_val = pvalue_M_GU, permutation = 1000, permutation_thres = 0.05, fdr = FALSE)

Show the network display and users can interact with it. Here, edgewidth is assigned to the significance level of the differential connection (z-score of edge connection with different colors for positive or negative changes).

# Show result 
head(result$activity_score)
head(result$diff_network)
# Show network
network_display(result = result, nodesize= 'Node_Degree', nodecolor= 'Activity_Score', edgewidth= TRUE, layout= 'nice')



ressomlab/INDEED documentation built on Aug. 3, 2022, 4:45 p.m.