knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

Differential expression (DE) analysis is commonly used to identify biomarker candidates that have significant changes in their expression levels between distinct biological groups. One drawback of DE analysis is that it only considers the changes on single biomolecular level. In differential network (DN) analysis, network is typically built based on the correlation and biomarker candidates are selected by investigating the network topology. However, correlation tends to generate over-complicated networks and the selection of biomarker candidates purely based on network topology ignores the changes on single biomolecule level. Thus, we have proposed a novel method INDEED, which considers both the changes on single biomolecular and network levels by integrating DE and DN analysis. INDEED has been published in Methods journal (PMID: 27592383). This is the R package that implements the algorithm.

This R package will generate a list of dataframes containing information such as p-values, node degree and activity score for each biomolecule. A higher activity score indicates that the corresponding biomolecule has more neighbors connected in the differential network and their p-values are more statistically significant. It will also generate a network display to aid users' biomarker selection.

Installation

You can install INDEED from github with:

# install.packages("devtools")
devtools::install_github("ressomlab/INDEED")

Load package

Load the package.

# load INDEED
library(INDEED)

Testing dataset

A testing dataset has been provided to the users to get familiar with INDEED R package. It contains the expression levels of 39 metabolites from 120 subjects (CIRR: 60; HCC: 60) with CIRR group named as group 0 and HCC group named as group 1.

# Data matrix contains the expression levels of 39 metabolites from 120 subjects 
# (6 metabolites and 10 subjects are shown)
head(Met_GU[, 1:10])
# Group label for each subject (40 subjects are shown)
Met_Group_GU[1:40]
# Metabolite KEGG IDs (10 metabolites are shown)
Met_name_GU[1:10]

non-partial correlation data analysis function non_partial_cor()

In non partial correlation method, users only need to run non_partial_cor() function. Result will be saved in a list of two dataframes: activity_score and diff_network. activity_score dataframe contains biomolecules ranked by activity score calculated from p-value and node degree. diff_network dataframe contains binary and weight connections for network display.

The following example demonstrates how to use non_partial_cor() function:

result <- non_partial_cor(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, method = "pearson", 
                          p_val = pvalue_M_GU, permutation = 1000, permutation_thres = 0.05, fdr = TRUE)

partial correlation data preprocessing function select_rho_partial()

partial correlation data analysis function partial_cor()

In partial correlation method, users will need to preprocess the data using select_rho_partial() function, and then apply partial_cor() function to complete the analysis. Users can provide a p-value table from their DE analysis in partial_cor() function. Result will be saved in a list of two dataframes: activity_score and diff_network. activity_score dataframe contains biomolecules ranked by activity score calculated from p-value and node degree. diff_network dataframe contains binary and weight connections for network display.

The following example demonstrates how to use select_rho_partial() and partial_cor()function:

pre_data <- select_rho_partial(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU,
                               error_curve = TRUE)
result <- partial_cor(data_list = pre_data, rho_group1 = 'min', rho_group2 = "min", p_val = pvalue_M_GU,
                      permutation = 1000, permutation_thres = 0.05, fdr = TRUE)

In this example, the sparse differential network is based on partial correlation. p-value for each biomolecule is provided from users. rho is selected based on minimum rule. The number of permutations is set to 1000. The threshold is 0.05. Multiple testing correction is applied.

Interactive Network Visualization function network_display()

This is an interactive function to assist in the visualization of the result from INDEED functions non_partial_corr() or patial_corr(). The size and the color of each node can be adjusted by users to represent either the Node_Degree, Activity_Score, Z_Score, or P_Value. The color of the edge is based on the binary value of either 1 corresponding to a positive correlation depicted as green or a negative correlation of -1 depicted as red. Users also have the option of having the width of each edge be proportional to its weight value. The layout of the network can also be customized by choosing from the options: 'nice', 'sphere', 'grid', 'star', and 'circle'. Nodes can be moved and zoomed in on. Each node and edge will display extra information when clicked on. Secondary interactions will be highlighted as well when a node is clicked on.

The following example demonstrates how to use the network_display() function:

result <- non_partial_cor(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, method = "pearson",
                          p_val = pvalue_M_GU, permutation = 1000, permutation_thres = 0.05, fdr = FALSE)
network_display(result = result, nodesize= 'Node_Degree', nodecolor= 'Activity_Score', 
                edgewidth= FALSE, layout= 'nice')


ressomlab/INDEED documentation built on Aug. 3, 2022, 4:45 p.m.