knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(bnRep)
The R package bnRep
includes the largest repository of Bayesian networks, which were all collected from recent academic literature in a variety of fields! If you are using any Bayesian network from bnRep
you should cite:
Leonelli, M (2024). "bnRep: A repository of Bayesian networks from the academic literature." arXiv preprint arXiv:2409.19158.
@Article{, title = {bn{R}ep: A repository of {B}ayesian networks from the academic literature}, author = {Manuele Leonelli}, journal = {arXiv preprint arXiv:2409.19158}, year = {2024} }
Go to (link here) to explore the repository online!
If you are interested in having your Bayesian network included in bnRep
you must prepare three objects:
the Bayesian network as a bn.fit
object (if not created with bnlearn
you can always use import functions, such as read.bif()
);
an R file with the same name of the bn.fit
object reporting the documentation of the Bayesian network;
a vector/excel file with the required details to include in the bnRep_summary
object.
You can submit the required objects directly via github (e.g fork/pull), or via email.
If you struggle with any of these steps, please get in touch and I will try to help!
bnRep
includes over 200 Bayesian networks from more than 150 academic publications. It includes discrete, Gaussian and conditional linear Gaussian Bayesian networks, all stored as appropriate bn.fit
objects from bnlearn
. They can be exported for use to other software (e.g. Phython libraries) using functions from bnlearn
such as write.bif()
. Recall that in order to plot the associated DAG, one must first convert it to a graph object with bn.net()
from the bnlearn
package.
# Install stable version from CRAN: install.packages("bnRep") # Or the development version from GitHub: remotes::install_github("manueleleonelli/bnRep")
We will use the lawschool
Bayesian network as an example. To load it in the environment simply call data(lawschool)
and to then plot it (for instance using graphviz.plot
from the bnlearn
package)
library(bnRep) library(bnlearn) data("lawschool") qgraph::qgraph(bn.net(lawschool))
Notice that the function bn.net
function must be used in order to plot the network.
bnRep
includes two features to explore the Bayesian networks in the repository:
bnRep_summary
: a dataframe with important details about each network in the repository.
bnRep_app
: a Shiny app to interactively explore bnRep_summary
and filter the networks according to various criteria. The app is also available online at (link here).
Here's the columns of bnRep_summary
:
colnames(bnRep_summary)
The following plots show some summary statistics of the repository.
library(ggplot2) library(dplyr) library(stringr) library(scales) library(RColorBrewer) # Assuming bnRep_summary is your data frame and Type is a factor column # Creating the barplot with percentages on the Y-axis and labels on the bars bnRep_summary %>% count(Type) %>% mutate(perc = n / sum(n) * 100) %>% ggplot(aes(x = Type, y = perc, fill = Type)) + geom_bar(stat = "identity", width = 0.7, color = "black", show.legend = FALSE) + geom_text(aes(label = paste0(round(perc, 1), "%")), vjust = -0.5, size = 5, color = "black") + scale_y_continuous(labels = scales::percent_format(scale = 1), limits = c(0, 100)) + labs(title = "Bayesian networks by type", x = "Type", y = "Percentage") + theme_minimal(base_size = 15) + theme( plot.title = element_text(hjust = 0.5, face = "bold"), axis.text.x = element_text(angle = 45, hjust = 1), panel.grid.major = element_line(color = "gray80"), panel.grid.minor = element_blank() ) + scale_fill_brewer(palette = "Pastel1")
bnRep_summary %>% count(Structure) %>% mutate(perc = n / sum(n) * 100) %>% ggplot(aes(x = Structure, y = perc, fill = Structure)) + geom_bar(stat = "identity", width = 0.7, color = "black", show.legend = FALSE) + geom_text(aes(label = paste0(round(perc, 1), "%")), vjust = -0.5, size = 5, color = "black") + scale_y_continuous(labels = scales::percent_format(scale = 1), limits = c(0, 50)) + labs(title = "Bayesian networks by structure definition", x = "Type", y = "Percentage") + theme_minimal(base_size = 15) + theme( plot.title = element_text(hjust = 0.5, face = "bold"), axis.text.x = element_text(angle = 45, hjust = 1), panel.grid.major = element_line(color = "gray80"), panel.grid.minor = element_blank() ) + scale_fill_brewer(palette = "Pastel1")
bnRep_summary %>% count(Probabilities) %>% mutate(perc = n / sum(n) * 100) %>% ggplot(aes(x = Probabilities, y = perc, fill = Probabilities)) + geom_bar(stat = "identity", width = 0.7, color = "black", show.legend = FALSE) + geom_text(aes(label = paste0(round(perc, 1), "%")), vjust = -0.5, size = 5, color = "black") + scale_y_continuous(labels = scales::percent_format(scale = 1), limits = c(0, 60)) + labs(title = "Bayesian networks by probabilities definition", x = "Type", y = "Percentage") + theme_minimal(base_size = 15) + theme( plot.title = element_text(hjust = 0.5, face = "bold"), axis.text.x = element_text(angle = 45, hjust = 1), panel.grid.major = element_line(color = "gray80"), panel.grid.minor = element_blank() ) + scale_fill_brewer(palette = "Pastel1")
bnRep_summary %>% count(Graph) %>% mutate(perc = n / sum(n) * 100) %>% ggplot(aes(x = Graph, y = perc, fill = Graph)) + geom_bar(stat = "identity", width = 0.7, color = "black", show.legend = FALSE) + geom_text(aes(label = paste0(round(perc, 1), "%")), vjust = -0.5, size = 5, color = "black") + scale_y_continuous(labels = scales::percent_format(scale = 1), limits = c(0, 70)) + labs(title = "Bayesian networks by graph type", x = "Type", y = "Percentage") + theme_minimal(base_size = 15) + theme( plot.title = element_text(hjust = 0.5, face = "bold"), axis.text.x = element_text(angle = 45, hjust = 1), panel.grid.major = element_line(color = "gray80"), panel.grid.minor = element_blank() ) + scale_fill_brewer(palette = "Pastel1")
custom_colors <- colorRampPalette(RColorBrewer::brewer.pal(9, "Pastel1"))(length(unique(bnRep_summary$Area))) bnRep_summary %>% count(Area) %>% mutate(perc = n / sum(n) * 100) %>% ggplot(aes(x = Area, y = perc, fill = Area)) + geom_bar(stat = "identity", width = 0.7, color = "black", show.legend = FALSE) + geom_text(aes(label = paste0(round(perc, 1), "%")), vjust = -0.5, size = 3.5, color = "black") + scale_y_continuous(labels = scales::percent_format(scale = 1), limits = c(0, 25)) + labs(title = "Bayesian networks by academic area", x = "Type", y = "Percentage") + theme_minimal(base_size = 15) + theme( plot.title = element_text(hjust = 0.5, face = "bold"), axis.text.x = element_text(angle = 60, hjust = 1), panel.grid.major = element_line(color = "gray80"), panel.grid.minor = element_blank() ) + scale_fill_manual(values = custom_colors)
bnRep_summar <- bnRep_summary bnRep_summar <- bnRep_summar %>% mutate(Year = ifelse(Year <= 2019, "2019 and earlier", as.character(Year))) # Convert Year into a factor for plotting bnRep_summar$Year <- factor(bnRep_summar$Year, levels = c("2019 and earlier", sort(unique(bnRep_summar$Year[bnRep_summar$Year != "2019 and earlier"])))) bnRep_summar %>% count(Year) %>% mutate(perc = n / sum(n) * 100) %>% ggplot(aes(x = Year, y = perc, fill = Year)) + geom_bar(stat = "identity", width = 0.7, color = "black", show.legend = FALSE) + geom_text(aes(label = paste0(round(perc, 1), "%")), vjust = -0.5, size = 5, color = "black") + scale_y_continuous(labels = scales::percent_format(scale = 1), limits = c(0, 35)) + labs(title = "Bayesian networks by year", x = "Type", y = "Percentage") + theme_minimal(base_size = 15) + theme( plot.title = element_text(hjust = 0.5, face = "bold"), axis.text.x = element_text(angle = 45, hjust = 1), panel.grid.major = element_line(color = "gray80"), panel.grid.minor = element_blank() ) + scale_fill_brewer(palette = "Pastel1")
unique_journals_df <- bnRep_summary %>% distinct(Reference, .keep_all = TRUE) %>% group_by(Journal) %>% filter(n() >= 3) %>% ungroup() unique_journals_df <- unique_journals_df %>% mutate(Journal = stringr::str_replace_all(Journal, "\\s", "\n")) # Create a barplot with counts instead of percentages unique_journals_df %>% count(Journal) %>% ggplot(aes(x = Journal, y = n, fill = Journal)) + # y = n for counts geom_bar(stat = "identity", width = 0.7, color = "black", show.legend = FALSE) + geom_text(aes(label = n), vjust = -0.5, size = 5, color = "black") + # Show counts on top of bars labs(title = "Bayesian Networks by Journal (having at least 3)", x = "Journal", y = "Count") + theme_minimal(base_size = 15) + theme( plot.title = element_text(hjust = 0.5, face = "bold"), axis.text.x = element_text(angle = 0, hjust = 0.5), # No rotation, centered panel.grid.major = element_line(color = "gray80"), panel.grid.minor = element_blank() ) + scale_fill_brewer(palette = "Pastel1")+ scale_y_continuous(limits = c(0, 7))
# Create the histogram with log10 scale on the x-axis ggplot(bnRep_summary, aes(x = Nodes)) + geom_histogram(binwidth = 0.1, fill = "skyblue", color = "black", alpha = 0.7) + # Customize the appearance scale_x_log10() + # Set x-axis to log10 scale labs(title = "Distribution of Nodes (Log10 Scale)", x = "Number of Nodes (log10 scale)", y = "Count") + theme_minimal(base_size = 15) + # Use a clean theme with larger base text size theme( plot.title = element_text(hjust = 0.5, face = "bold"), # Center the title axis.title.x = element_text(face = "bold"), # Make x-axis title bold axis.title.y = element_text(face = "bold"), # Make y-axis title bold panel.grid.major = element_line(color = "gray80"), # Customize grid lines panel.grid.minor = element_blank(), # Remove minor grid lines for a cleaner look axis.text.x = element_text(angle = 45, hjust = 1) # Angle the x-axis labels for better readability ) + scale_fill_brewer(palette = "Set2") # Use a custom color palette
ggplot(bnRep_summary, aes(x = Arcs)) + geom_histogram(binwidth = 0.1, fill = "skyblue", color = "black", alpha = 0.7) + # Customize the appearance scale_x_log10() + # Set x-axis to log10 scale labs(title = "Distribution of Arcs (Log10 Scale)", x = "Number of Arcs (log10 scale)", y = "Count") + theme_minimal(base_size = 15) + # Use a clean theme with larger base text size theme( plot.title = element_text(hjust = 0.5, face = "bold"), # Center the title axis.title.x = element_text(face = "bold"), # Make x-axis title bold axis.title.y = element_text(face = "bold"), # Make y-axis title bold panel.grid.major = element_line(color = "gray80"), # Customize grid lines panel.grid.minor = element_blank(), # Remove minor grid lines for a cleaner look axis.text.x = element_text(angle = 45, hjust = 1) # Angle the x-axis labels for better readability ) + scale_fill_brewer(palette = "Set2")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.