bdgraph: Search algorithm in graphical models In BDgraph: Bayesian Structure Learning in Graphical Models using Birth-Death MCMC

Description

As the main function of the BDgraph package, this function consists of several MCMC sampling algorithms for Bayesian model determination in undirected graphical models. To speed up the computations, the birth-death MCMC sampling algorithms are implemented in parallel using OpenMP in C++.

Usage

 1 2 3 4 bdgraph( data, n = NULL, method = "ggm", algorithm = "bdmcmc", iter = 5000, burnin = iter / 2, not.cont = NULL, g.prior = 0.5, df.prior = 3, g.start = "empty", jump = NULL, save = FALSE, cores = NULL, threshold = 1e-8 )

Arguments

 data There are two options: (1) an (n \times p) matrix or a data.frame corresponding to the data, (2) an (p \times p) covariance matrix as S=X'X which X is the data matrix (n is the sample size and p is the number of variables). It also could be an object of class "sim", from function bdgraph.sim. The input matrix is automatically identified by checking the symmetry. n The number of observations. It is needed if the "data" is a covariance matrix. method A character with two options "ggm" (default) and "gcgm". Option "ggm" is for Gaussian graphical models based on Gaussianity assumption. Option "gcgm" is for Gaussian copula graphical models for the data that not follow Gaussianity assumption (e.g. continuous non-Gaussian, count, or mixed dataset). algorithm A character with two options "bdmcmc" (default) and "rjmcmc". Option "bdmcmc" is based on birth-death MCMC algorithm. Option "rjmcmc" is based on reverible jump MCMC algorithm. iter The number of iteration for the sampling algorithm. burnin The number of burn-in iteration for the sampling algorithm. not.cont For the case method = "gcgm", a vector with binary values in which 1 indicates not continuous variables. g.prior For determining the prior distribution of each edge in the graph. There are two options: a single value between 0 and 1 (e.g. 0.5 as a noninformative prior) or an (p \times p) matrix with elements between 0 and 1. df.prior The degree of freedom for G-Wishart distribution, W_G(b,D), which is a prior distribution of the precision matrix. g.start Corresponds to a starting point of the graph. It could be an (p \times p) matrix, "empty" (default), or "full". Option "empty" means the initial graph is an empty graph and "full" means a full graph. It also could be an object with S3 class "bdgraph" of R package BDgraph or the class "ssgraph" of R package ssgraph::ssgraph(); this option can be used to run the sampling algorithm from the last objects of previous run (see examples). jump It is only for the BDMCMC algorithm (algorithm = "bdmcmc"). It is for simultaneously updating multiple links at the same time to update graph in the BDMCMC algorithm. save Logical: if FALSE (default), the adjacency matrices are NOT saved. If TRUE, the adjacency matrices after burn-in are saved. cores The number of cores to use for parallel execution. The case cores="all" means all CPU cores to use for parallel execution. threshold The threshold value for the convergence of sampling algorithm from G-Wishart for the precision matrix.

Value

An object with S3 class "bdgraph" is returned:

 p_links An upper triangular matrix which corresponds the estimated posterior probabilities of all possible links. K_hat The posterior estimation of the precision matrix.

For the case "save = TRUE" is returned:

 sample_graphs A vector of strings which includes the adjacency matrices of visited graphs after burn-in. graph_weights A vector which includes the waiting times of visited graphs after burn-in. all_graphs A vector which includes the identity of the adjacency matrices for all iterations after burn-in. It is needed for monitoring the convergence of the BD-MCMC algorithm. all_weights A vector which includes the waiting times for all iterations after burn-in. It is needed for monitoring the convergence of the BD-MCMC algorithm.

References

Mohammadi, R. and Wit, E. C. (2019). BDgraph: An R Package for Bayesian Structure Learning in Graphical Models, Journal of Statistical Software, 89(3):1-30

Mohammadi, A. and Wit, E. C. (2015). Bayesian Structure Learning in Sparse Gaussian Graphical Models, Bayesian Analysis, 10(1):109-138

Mohammadi, A. et al (2017). Bayesian modelling of Dupuytren disease by using Gaussian copula graphical models, Journal of the Royal Statistical Society: Series C, 66(3):629-645

Letac, G., Massam, H. and Mohammadi, R. (2018). The Ratio of Normalizing Constants for Bayesian Graphical Gaussian Model Selection, arXiv preprint arXiv:1706.04416v2

Dobra, A. and Mohammadi, R. (2018). Loglinear Model Selection and Human Mobility, Annals of Applied Statistics, 12(2):815-845

Mohammadi, A. and Dobra A. (2017). The R Package BDgraph for Bayesian Structure Learning in Graphical Models, ISBA Bulletin, 24(4):11-16

Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ## Not run: # --- Example 1 # Generating multivariate normal data from a 'random' graph data.sim <- bdgraph.sim( n = 20, p = 6, size = 7, vis = TRUE ) bdgraph.obj <- bdgraph( data = data.sim, iter = 1000 ) summary( bdgraph.obj ) # To compare our result with true graph compare( data.sim, bdgraph.obj, main = c( "Target", "BDgraph" ) ) # Running algorithm with starting points from previous run bdgraph.obj2 <- bdgraph( data = data.sim, g.start = bdgraph.obj ) compare( data.sim, bdgraph.obj, bdgraph.obj2, main = c( "Target", "Frist run", "Second run" ) ) # --- Example 2 # Generating mixed data from a 'scale-free' graph data.sim <- bdgraph.sim( n = 50, p = 6, type = "mixed", graph = "scale-free", vis = TRUE ) bdgraph.obj <- bdgraph( data = data.sim, method = "gcgm", iter = 10000 ) summary( bdgraph.obj ) compare( data.sim, bdgraph.obj ) ## End(Not run)

Example output    1000 iteration is started.
Iteration  1000
$selected_g [,1] [,2] [,3] [,4] [,5] [,6] [1,] 0 1 0 0 0 0 [2,] 0 0 1 0 0 0 [3,] 0 0 0 1 0 1 [4,] 0 0 0 0 0 1 [5,] 0 0 0 0 0 0 [6,] 0 0 0 0 0 0$p_links
[,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    1 0.26 0.17 0.18 0.29
[2,]    0    0 0.55 0.36 0.27 0.46
[3,]    0    0 0.00 0.98 0.24 0.98
[4,]    0    0 0.00 0.00 0.32 1.00
[5,]    0    0 0.00 0.00 0.00 0.49
[6,]    0    0 0.00 0.00 0.00 0.00

$K_hat [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1.22 1.51 -0.07 0.01 0.02 -0.04 [2,] 1.51 5.37 0.50 -0.09 0.00 -0.54 [3,] -0.07 0.50 1.69 -1.81 -0.08 -1.62 [4,] 0.01 -0.09 -1.81 4.88 0.15 3.48 [5,] 0.02 0.00 -0.08 0.15 1.70 -0.28 [6,] -0.04 -0.54 -1.62 3.48 -0.28 4.90 Target BDgraph true positive 7 5.000 true negative 8 8.000 false positive 0 0.000 false negative 0 2.000 F1-score 1 0.833 specificity 1 1.000 sensitivity 1 0.714 MCC 1 0.756 5000 iteration is started. Iteration 1000 Iteration 2000 Iteration 3000 Iteration 4000 Iteration 5000 Target Frist run Second run true positive 7 5.000 5.000 true negative 8 8.000 8.000 false positive 0 0.000 0.000 false negative 0 2.000 2.000 F1-score 1 0.833 0.833 specificity 1 1.000 1.000 sensitivity 1 0.714 0.714 MCC 1 0.756 0.756 10000 iteration is started. Iteration 1000 Iteration 2000 Iteration 3000 Iteration 4000 Iteration 5000 Iteration 6000 Iteration 7000 Iteration 8000 Iteration 9000 Iteration 10000$selected_g
[,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    0    1    1    0    0
[2,]    0    0    1    1    0    0
[3,]    0    0    0    0    0    0
[4,]    0    0    0    0    0    0
[5,]    0    0    0    0    0    0
[6,]    0    0    0    0    0    0

$p_links [,1] [,2] [,3] [,4] [,5] [,6] [1,] 0 0.46 0.94 1.00 0.14 0.14 [2,] 0 0.00 0.54 0.68 0.20 0.25 [3,] 0 0.00 0.00 0.34 0.38 0.23 [4,] 0 0.00 0.00 0.00 0.16 0.23 [5,] 0 0.00 0.00 0.00 0.00 0.47 [6,] 0 0.00 0.00 0.00 0.00 0.00$K_hat
[,1]  [,2]  [,3]  [,4]  [,5]  [,6]
[1,]  4.77 -0.26  1.03 -3.02 -0.01 -0.02
[2,] -0.26  5.24  0.50 -1.09 -0.03 -0.08
[3,]  1.03  0.50  1.79 -0.13  0.12  0.06
[4,] -3.02 -1.09 -0.13  4.89  0.00  0.08
[5,] -0.01 -0.03  0.12  0.00  1.17 -0.15
[6,] -0.02 -0.08  0.06  0.08 -0.15  1.16

Target estimate1
true positive       5     2.000
true negative      10     8.000
false positive      0     2.000
false negative      0     3.000
F1-score            1     0.444
specificity         1     0.800
sensitivity         1     0.400
MCC                 1     0.213

BDgraph documentation built on May 3, 2021, 9:08 a.m.