# Set global chunk options knitr::opts_chunk$set( echo = TRUE, message = FALSE, warning = FALSE, fig.width = 6, fig.height = 4 ) # Load libraries library(CCMnet) library(dplyr) library(tidyr) library(ggplot2) set.seed(1)
The CCMnet package is designed for generating network ensembles that reflect structural uncertainty in network properties. While traditional statistical models often place probability distributions directly on graphs, Congruence Class Models (CCMs) define distributions over specific network statistics—such as edge counts, degree sequences, or mixing patterns.
Using CCMnet for network analysis typically follows a four-step process:
Model Specification: Defining the network properties of interest and their corresponding probability distributions.
MCMC Sampling: Generate a representative ensemble of networks, but only return the network statistics (not entire networks).
Verification and Diagnostics: Assessing the sampler's performance and validating that the generated network statistics align with the specified theoretical targets.
Create Network Ensemble for Analysis : Once the sampler is validated, one can generate an emsemble of networks for analysis.
This section verifies that the CCM sampler correctly reproduces networks according to specified edge-count distributions. In a CCM, the probability of sampling a network with $k$ edges is defined by the probability assigned to the congruence class consisting of all networks with exactly $k$ edges. The sampler then assigns equal probability to each individual network within that class.
We fit a CCM where the number of edges follows a Poisson distribution with (\lambda = 350). The probability of sampling a network with k edges is given by the Poisson mass function:
[ P(K=k) = \frac{\lambda^k e^{-\lambda}}{k!} ]
ccm_sample <- sample_ccm( network_stats = c("edges"), prob_distr = c("poisson"), prob_distr_params = list(list(350)), population = 50 ) summary(ccm_sample) print(ccm_sample) # Compare MCMC samples to theoretical Poisson distribution ccm_sample<- sample_theoretical(ccm_sample, n_sim = 1000) # Plot density with theoretical distribution plot(ccm_sample, stats = "edges", type = "hist", include_theoretical = TRUE)
Explanation: The density plot shows the CCM MCMC samples (blue) and the theoretical Poisson distribution (red). Close overlap verifies correct sampling.
We fit a CCM where the number of edges follows a uniform distribution.
ccm_sample <- sample_ccm( network_stats = c("edges"), prob_distr = c("uniform"), prob_distr_params = list(list(0)), population = 20, sample_size = 20000L, burnin = 100000L, interval = 1000L, ) ccm_sample<- sample_theoretical(ccm_sample, n_sim = 10000) plot(ccm_sample, stats = "edges", type = "hist", include_theoretical = TRUE)
Explanation: The density plot verifies that CCM samples uniformly reproduce all possible edge counts.
Arbitrary edge count distributions can be specified using the non-parametric ("NP") option. This direct control over congruence class probabilities allows for stable simulation even under non-standard structural priors.
n_max <- choose(50, 2) alpha <- dpois(0:n_max, lambda = 50) + dpois(0:n_max, lambda = 100) prob_distr_params <- alpha / sum(alpha) ccm_sample <- sample_ccm( network_stats = c("edges"), prob_distr = c("np"), prob_distr_params = list(list(prob_distr_params)), population = 50L, sample_size = 10000L, burnin = 100000L ) ccm_sample<- sample_theoretical(ccm_sample, n_sim = 50000) plot(ccm_sample, stats = "edges", type = "hist", include_theoretical = TRUE)
Explanation: The density plot verifies that CCM samples uniformly reproduce all possible edge counts.
Here, the network property of interest is the frequency of each degree within the network. This distribution is modeled using a Dirichlet-Multinomial specification, which allows for flexible modeling of degree frequencies while accounting for overdispersion. The verification step ensures that the empirical densities of node degrees (e.g., the number of nodes with degrees 0 through 3) closely match the theoretical target distribution.
ccm_sample<- sample_ccm(network_stats=c('degreedist'), prob_distr=c('dirmult'), prob_distr_params=list(list(c(2,21,15,12))), population = 100L, sample_size = 10000L, burnin=100000L, interval=1000L) ccm_sample <- sample_theoretical(ccm_sample, n_sim = 1000) plot(ccm_sample, stats = paste0("deg", 0:3), type = "hist", include_theoretical = TRUE )
Explanation: The density plot verifies that CCM samples reproduce the correct degre distribution.
This section illustrates the sampler for models involving higher-order dependencies. In this example, a multivariate normal constraint is placed on the degree mixing matrix and a univariate normal constraint is placed on the triangle count. Because the degree mixing matrix effectively determines the number of connected triples, the simultaneous constraint on triangle counts implicitly influences the network’s global clustering coefficient.
mean_vec = c(23, 66, 44, 20, 120, 80) var_mat = matrix(data = c(22, -3, -2, -5, -6, -4, -3, 58, -7, -14, -18, -12, -2, -7, 41, -9, -12, -8, -5, -14, -9, 75, -25, -17, -6, -18, -12, -25, 89, -22, -4, -12, -8, -17, -22, 68), ncol = 6) prob_distr_params = list(list(mean_vec, var_mat), list(10,3)) ccm_sample <- sample_ccm(network_stats=c('degmixing', 'triangles'), prob_distr=c('mvn', 'normal'), prob_distr_params=prob_distr_params, sample_size = 10000L, burnin=100000L, interval=1000L, population=500L) ccm_sample <- sample_theoretical(ccm_sample, n_sim = 1000) plot(ccm_sample, stats = c("DM11", "DM12", "DM13", "DM22", "DM23", "DM33", "triangles"), type = "hist", include_theoretical = TRUE) plot(ccm_sample, stats = c("triangles"), type = "trace", include_theoretical = TRUE)
Explanation: The density plots show sampled degrees (blue) versus theoretical probabilities (red). Close match verifies correct MCMC sampling.
A primary use case for CCMnet is generating network ensembles that reflect the uncertainty in properties estimated from empirical data. These examples demonstrate how uncertainty induced by different sampling designs is explicitly propagated into the predictive network ensembles.
Posterior Predictive for a New School: This illustrates whole-network sampling, where uncertainty arises from heterogeneity across multiple distinct populations (e.g., different high schools).
Posterior Predictive for a Sampled School: This demonstrates individual-level sampling, where uncertainty arises from the partial observation of nodes and edges within a single population.
By comparing CCMnet to standard benchmarks like ERGMs, we can evaluate how well the model captures both population-level heterogeneity and sampling-induced uncertainty.
utils::data("faux.mesa.high", package = "ergm", envir = environment()) utils::data("faux.desert.high", package = "ergm", envir = environment()) utils::data("faux.dixon.high", package = "ergm", envir = environment()) utils::data("faux.magnolia.high", package = "ergm", envir = environment()) mesa_net = intergraph::asIgraph(faux.mesa.high) desert_net = intergraph::asIgraph(faux.desert.high) dixon_net = intergraph::asIgraph(faux.dixon.high) magnolia_net = intergraph::asIgraph(faux.magnolia.high) # Create summary table hs_summary <- data.frame( High_School = c("Mesa High", "Desert High", "Dixon High", "Magnolia High"), Nodes = c( igraph::vcount(mesa_net), igraph::vcount(desert_net), igraph::vcount(dixon_net), igraph::vcount(magnolia_net) ), Edges = c( igraph::gsize(mesa_net), igraph::gsize(desert_net), igraph::gsize(dixon_net), igraph::gsize(magnolia_net) ) ) # Render table kableExtra::kable( hs_summary, caption = "Summary of high school friendship networks from the ERGM package" )
This section illustrates whole-network sampling, where uncertainty arises from heterogeneity across multiple distinct populations (e.g., different high schools).
#Normal densities <- hs_summary$Edges / choose(hs_summary$Nodes, 2) sigma <- sd(densities) prior <- RBesT::mixnorm(c(1, 0.5, 1), sigma = sigma) post_normal <- RBesT::postmix(prior, m = mean(densities), n = length(densities))
population = 100 n_samples = 10000L #CCM ccm_sample <- sample_ccm( network_stats = c("density"), prob_distr = c("normal"), prob_distr_params = list(list(post_normal[2], (post_normal[3])^2)), population = population, sample_size = n_samples, burnin = 100000L, interval = 1000L, ) #ERGM net <- network::network(population, directed = FALSE) ergm_sample <- simulate(net ~ edges, coef = log(post_normal[2] / (1 - post_normal[2])), nsim = n_samples, output = "stats") / choose(population, 2) #G(n,m) m_edges = round(post_normal[2]*choose(population, 2)) er_gnm_list <- replicate(n_samples, igraph::sample_gnm(n = population, m = m_edges, directed = FALSE), simplify = FALSE) er_gnm_stats <- sapply(er_gnm_list, igraph::ecount) er_gnm_sample <- er_gnm_stats / choose(population, 2)
Network_samples <- data.frame( value = c(ccm_sample$mcmc_stats$density, ergm_sample, er_gnm_sample), model = c(rep("CCMnet", n_samples), rep("ERGM", n_samples), rep("ER", n_samples)) ) ggplot2::ggplot( Network_samples %>% filter(model != "ER"), aes(x = value, fill = model) ) + geom_density(alpha = 0.25, linewidth = .1) + geom_vline( aes(xintercept = er_gnm_sample[1], colour = "ER"), linewidth = 1, linetype = "solid" ) + scale_fill_manual( values = c(CCMnet = "red", ERGM = "green") ) + scale_colour_manual( values = c(ER = "blue") ) + labs(x = "Density", y = "Probability density", fill = "Model", colour = "Model") + theme_bw() + theme(legend.position = "bottom", legend.title = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), plot.title = element_text(size = 16, face = "bold"), axis.title.x = element_text(size = 12), axis.title.y = element_text(size = 12), axis.text.x = element_text(size = 10), strip.text.x = element_text(size = 14), strip.text.y = element_text(size = 14), strip.background = element_rect(fill = "lightgray", color = "black"))
res = data.frame(model = NULL, value = NULL) dixon_deg = igraph::degree(dixon_net) prior.unif <- RBesT::mixbeta(c(1, 1, 1)) N = igraph::vcount(dixon_net) n_samples <- 1000L num_sample_nodes_seq = seq(40,240,40) for (num_sample_nodes in num_sample_nodes_seq) { dixon_deg_sample = sample(x = dixon_deg, size = num_sample_nodes, replace = FALSE) r=sum(dixon_deg_sample) n=sum(num_sample_nodes*(N-1)) posterior.sum_beta <- RBesT::postmix(prior.unif, n=n, r=r) alpha_post <- posterior.sum_beta[2] beta_post <- posterior.sum_beta[3] # Infinite-population variance var_inf <- (alpha_post * beta_post) / ((alpha_post + beta_post)^2 * (alpha_post + beta_post + 1)) # Finite population correction fpc <- (N - num_sample_nodes) / (N - 1) var_fpc <- var_inf * fpc # Moment-matched Beta parameters mu <- alpha_post / (alpha_post + beta_post) S <- mu * (1 - mu) / var_fpc - 1 posterior.sum_beta[2] <- mu * S posterior.sum_beta[3] <- (1 - mu) * S ccm_sample <- sample_ccm( network_stats = c("density"), prob_distr = c("beta"), prob_distr_params = list(list(posterior.sum_beta[2], posterior.sum_beta[3])), population = N, sample_size = n_samples, burnin = 100000L ) res = bind_rows(res, data.frame( model = rep("CCMnet", length(ccm_sample$mcmc_stats$density)), value = ccm_sample$mcmc_stats$density, sample = paste("Samples:",num_sample_nodes))) net <- network::network(N, directed = FALSE) p = posterior.sum_beta[2]/(posterior.sum_beta[2] + posterior.sum_beta[3]) theta = log(p / (1-p)) ERGM <- simulate( net ~ edges, coef = theta, nsim = n_samples, output = "stats" ) / choose(N, 2) ER = rep(posterior.sum_beta[2]/(posterior.sum_beta[2] + posterior.sum_beta[3]), n_samples) res <- bind_rows(res, data.frame( value = c(ERGM, ER), model = c(rep("ERGM", n_samples), rep("ER", n_samples)), sample = c(rep(paste("Samples:",num_sample_nodes), n_samples), rep(paste("Samples:",num_sample_nodes), n_samples)) )) }
res$sample <- factor( res$sample, levels = paste("Samples:", num_sample_nodes_seq) ) # ER vertical lines ER_lines <- res %>% filter(model == "ER") %>% group_by(sample, model) %>% summarise(xintercept = unique(value), .groups = "drop") ggplot2::ggplot( res %>% filter(model != "ER"), aes(x = value, fill = model) ) + geom_density(alpha = 0.25, linewidth = .1) + # ER vertical lines geom_vline( data = ER_lines, aes(xintercept = xintercept, colour = model), linetype = "solid", linewidth = 1 ) + scale_fill_manual( values = c( CCMnet = "red", ERGM = "green" ) ) + scale_colour_manual( values = c( ER = "blue" ) ) + labs( x = "Density", y = "Probability density", fill = "Model", colour = NULL ) + guides( colour = guide_legend(override.aes = list(fill = NA)), fill = guide_legend(override.aes = list(colour = NA)) ) + theme_bw() + theme(legend.position = "bottom", legend.title = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), plot.title = element_text(size = 16, face = "bold"), axis.title.x = element_text(size = 12), axis.title.y = element_text(size = 12), axis.text.x = element_text(size = 10), strip.text.x = element_text(size = 14), strip.text.y = element_text(size = 14), strip.background = element_rect(fill = "lightgray", color = "black")) + facet_wrap(~sample, scales = "free")
ccm_sample <- sample_ccm( network_stats = list("density"), prob_distr = list("normal"), prob_distr_params = list(list(post_normal[2], (post_normal[3])^2)), population = 100L, sample_size = 10000L, burnin = 100000L ) school_ensemble <- sample_ccm( network_stats = list("density"), prob_distr = list("normal"), prob_distr_params = list(list(post_normal[2], (post_normal[3])^2)), population = 100L, sample_size = 10L, burnin = 1L, interval = 1000L, initial_g = ccm_sample$g[[1]], use_initial_g = TRUE, stats_only = FALSE) class(school_ensemble$g) length(school_ensemble$g) class(school_ensemble$g[[1]])
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.