library(abn)
In this vignette, we will simulate data from an additive Bayesian network and compare it to the original data.
First, we will fit a model to the original data that we will use to simulate new data from.
We will use the ex1.dag.data
data set and fit a model to it.
# Load example data mydat <- ex1.dag.data # Set the distribution of each node mydists <- list(b1="binomial", p1="poisson", g1="gaussian", b2="binomial", p2="poisson", b3="binomial", g2="gaussian", b4="binomial", b5="binomial", g3="gaussian") # Build the score cache mycache <- buildScoreCache(data.df = mydat, data.dists = mydists, method = "bayes", max.parents = 4) # Structure learning mp.dag <- mostProbable(score.cache = mycache) #> Step1. completed max alpha_i(S) for all i and S #> Total sets g(S) to be evaluated over: 1024 # Estimate the parameters myfit <- fitAbn(object = mp.dag) # Plot the DAG plot(myfit)
Based on the abnFit
object, we can simulate new data.
By default simulateAbn()
synthesizes 1000 new data points.
mydat_sim <- simulateAbn(object = myfit) str(mydat_sim) #> 'data.frame': 1000 obs. of 10 variables: #> $ b1: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ... #> $ b2: Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 2 2 2 ... #> $ b3: Factor w/ 2 levels "0","1": 2 1 2 1 1 1 2 1 2 2 ... #> $ b4: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ... #> $ b5: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 2 2 2 ... #> $ g1: num 0.796 -0.92 0.167 -2.602 -0.432 ... #> $ g2: num 0.112 -0.708 -1.621 0.115 1.504 ... #> $ g3: num 0.703 -0.891 0.206 -0.55 -1.458 ... #> $ p1: num 0 1 1 0 0 0 1 0 1 0 ... #> $ p2: num 17 7 6 16 9 12 7 9 5 9 ...
In the background, the simulateAbn()
function translates the abnFit
object into a BUGS model and calls the rjags
package to simulate new data.
Especially for debugging purposes, it can be usefull to manually inspect the BUGS file that is generated by simulateAbn()
.
This can be done by not running the simulation with run.simulation = FALSE
and print the BUGS file to console with verbose = TRUE
.
# Simulate new data and print the BUGS file to the console simulateAbn(object = myfit, run.simulation = FALSE, verbose = TRUE)
To store the BUGS file for reproducibility or manual inspection, we can set the bugsfile
argument to a file name to save the BUGS file to disk.
We can compare the original and simulated data by plotting the distributions of the variables.
# order the columns of mydat equal to mydat_sim mydat <- mydat[, colnames(mydat_sim)]
library(ggplot2) library(gridExtra) # Create a list of variables variables <- names(mydat) # Initialize an empty list to store plots plots <- list() # For each variable for (i in seq_along(variables)) { # Check if the variable is numeric if (is.numeric(mydat[[variables[i]]])) { # Create a histogram for the variable in mydat p1 <- ggplot(mydat, aes(!!as.name(variables[i]))) + geom_histogram(binwidth = 0.5, fill = "skyblue", color = "black") + labs(title = paste("mydat", variables[i]), x = variables[i], y = "Count") + theme_minimal() # Create a histogram for the variable in mydat_sim p2 <- ggplot(mydat_sim, aes(!!as.name(variables[i]))) + geom_histogram(binwidth = 0.5, fill = "skyblue", color = "black") + labs(title = paste("mydat_sim", variables[i]), x = variables[i], y = "Count") + theme_minimal() } else { # Create a bar plot for the variable in mydat p1 <- ggplot(mydat, aes(!!as.name(variables[i]))) + geom_bar(fill = "skyblue", color = "black") + labs(title = paste("mydat", variables[i]), x = variables[i], y = "Count") + theme_minimal() # Create a bar plot for the variable in mydat_sim p2 <- ggplot(mydat_sim, aes(!!as.name(variables[i]))) + geom_bar(fill = "skyblue", color = "black") + labs(title = paste("mydat_sim", variables[i]), x = variables[i], y = "Count") + theme_minimal() } # Combine the plots into a grid plots[[i]] <- arrangeGrob(p1, p2, ncol = 2) }
# Print all plots do.call(grid.arrange, c(plots, ncol = 1))
The plots show that the distributions of the original and simulated data are similar.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.