calcGCD: Graphlet Correlation Distance (GCD)

View source: R/calcGCD.R

calcGCDR Documentation

Graphlet Correlation Distance (GCD)

Description

Computes the Graphlet Correlation Distance (GCD) - a graphlet-based distance measure - between two networks.

Following Yaveroglu et al. (2014), the GCD is defined as the Euclidean distance of the upper triangle values of the Graphlet Correlation Matrices (GCM) of two networks, which are defined by their adjacency matrices. The GCM of a network is a matrix with Spearman's correlations between the network's node orbits (Hocevar and Demsar, 2016).

The function considers only orbits for graphlets with up to four nodes. Orbit counts are determined using the function count4 from orca package.

Unobserved orbits would lead to NAs in the correlation matrix, which is why a row with pseudo counts of 1 is added to the orbit count matrices (ocount1 and ocount2).

The function is based on R code provided by Theresa Ullmann (https://orcid.org/0000-0003-1215-8561).

Usage

calcGCD(adja1, adja2, orbits = c(0, 2, 5, 7, 8, 10, 11, 6, 9, 4, 1))

Arguments

adja1, adja2

adjacency matrices (numeric) defining the two networks between which the GCD shall be calculated.

orbits

numeric vector with integers from 0 to 14 defining the graphlet orbits to use for GCD calculation. Minimum length is 2. Defaults to c(0, 2, 5, 7, 8, 10, 11, 6, 9, 4, 1), thus excluding redundant orbits such as the orbit o3. See details.

Details

By default, only the 11 non-redundant orbits are used. These are grouped according to their role: orbit 0 represents the degree, orbits 2, 5, 7 represent nodes within a chain, orbits 8, 10, 11 represent nodes in a cycle, and orbits 6, 9, 4, 1 represent a terminal node.

Value

An object of class gcd containing the following elements:

gcd Graphlet Correlation Distance between the two networks
ocount1, ocount2 Orbit counts
gcm1, gcm2 Graphlet Correlation Matrices

References

\insertRef

hocevar2016computationNetCoMi

\insertRefyaveroglu2014revealingNetCoMi

See Also

calcGCM, testGCM

Examples

library(phyloseq)

# Load data sets from American Gut Project (from SpiecEasi package)
data("amgut2.filt.phy")

# Split data into two groups: with and without seasonal allergies
amgut_season_yes <- phyloseq::subset_samples(amgut2.filt.phy, 
                                      SEASONAL_ALLERGIES == "yes")
amgut_season_no <- phyloseq::subset_samples(amgut2.filt.phy, 
                                     SEASONAL_ALLERGIES == "no")

# Make sample sizes equal to ensure comparability
n_yes <- phyloseq::nsamples(amgut_season_yes)
ids_yes <- phyloseq::get_variable(amgut_season_no, "X.SampleID")[1:n_yes]

amgut_season_no <- phyloseq::subset_samples(amgut_season_no, X.SampleID %in% ids_yes)

# Network construction
net <- netConstruct(amgut_season_yes,
                    amgut_season_no, 
                    filtTax = "highestFreq",
                    filtTaxPar = list(highestFreq = 50),
                    measure = "pearson",
                    normMethod = "clr",
                    zeroMethod = "pseudoZO",
                    sparsMethod = "thresh",
                    thresh = 0.5)

# Get adjacency matrices
adja1 <- net$adjaMat1
adja2 <- net$adjaMat2

# Network visualization
props <- netAnalyze(net)
plot(props, rmSingles = TRUE, cexLabels = 1.7)

# Calculate the GCD
gcd <- calcGCD(adja1, adja2)

gcd

# Orbit counts
head(gcd$ocount1)
head(gcd$ocount2)

# GCMs
gcd$gcm1
gcd$gcm2

# Test Graphlet Correlations for significant differences
gcmtest <- testGCM(gcd)

### Plot heatmaps
# GCM 1 (with significance code in the lower triangle)
plotHeat(gcmtest$gcm1, 
         pmat = gcmtest$pAdjust1,
         type = "mixed")

# GCM 2 (with significance code in the lower triangle)
plotHeat(gcmtest$gcm2, 
         pmat = gcmtest$pAdjust2,
         type = "mixed")

# Difference GCM1-GCM2 (with p-values in the lower triangle)
plotHeat(gcmtest$diff, 
         pmat = gcmtest$pAdjustDiff,
         type = "mixed",
         textLow = "pmat")
  

stefpeschel/NetCoMi documentation built on Feb. 4, 2024, 8:20 a.m.