README.md

bcb

Jireh Huang (jirehhuang@ucla.edu)

Bayesian Causal Bandits with Backdoor Adjustment Prior

Jireh Huang and Qing Zhou

Installation

On R version 3.6.0 or more recent, first install the phsl package according to the instructions provided by the author. The MCMC implementation requires the installation of bidag2, a modified fork of BiDAG. Unfortunately, some functionality of the bcb package is not compatible with Windows due to the integration of BIDA and modular-dag-sampling, copies of which are included in the inst folder with permission from the respective authors.

Then, run the following code in R to install this package from GitHub with its dependencies.

devtools::install_github("jirehhuang/bcb", dependencies = TRUE)

Reproducing Numerical Results

This section contains instructions for reproducing the experimental results in the paper. The following instructions assume that the current working directory is writable and contains the scripts in inst/scripts.

setwd("inst/scripts")

Main Experiments

The main experiments are described and presented in Section 6 and Appendix C. These results were obtained using a previous version of the package available at [Google Drive], compiled directly after the following [commit]. After downloading, run the following code in R to install the package from source.

install.packages("bcb_1.0.tar.gz", repos = NULL, dependencies = TRUE, type = "source")

1. Generating Networks and Data

source("generate_dkpar.R")

generate_dkpar.R creates two folders nested in a folder dkpar_3_6_3, randomly generating 100 discrete Bayesian networks in dkpar_3_6_3-d_0 and 100 Gaussian Bayesian networks in dkpar_3_6_3-g_0, as well as 10 observational datasets for each network. The script executed on a i5-9600k using a single CPU core in about 40 minutes. The generated networks and data are available at [Google Drive].

2. Executing Algorithms

Remove _0 from the generated directories from generate_dkpar.R, resulting in folder names dkpar_3_6_3-d and dkpar_3_6_3-g.

source("run_dkpar.R")

run_dkpar.R executes 44 algorithms on the networks and datasets dkpar_3_6_3-{d, g}, creating a folder for each algorithm and saving the results for each execution. The script executed in about 7045 days of single thread CPU time using the UCLA Hoffman2 computing cluster, creating over 400 GB of output. Since these results are not practical to store, only the compiled results are provided, discussed in the following step.

3. Compiling Results

source("compile_dkpar.R")

compile_dkpar.R creates the concise folder in dkpar_3_6_3-{d, g} and, for each algorithm, compiles and saves the results of all executions. Then, it averages the results of each algorithm, normalizing where appropriate, and combines all results into df_2.rds. The script executed in about 3 hours using 4 CPU cores on the UCLA Hoffman2 computing cluster. The compiled results are available at [Google Drive].

4. Analyzing Results

Append _done to the compiled directories from compile_dkpar.R, resulting in folder names dkpar_3_6_3-d_done and dkpar_3_6_3-g_done. The compiled results for each algorithm are not necessary – only df_2.rds and method_grid.txt in the concise folder.

source("analyze_dkpar.R")

analyze_dkpar.R creates the cumulative regret (cumulative.{eps, png}), cumulative regret with error bars (cumulative_err.{eps, png}), head start for competing algorithms in the discrete setting (hs-d.{eps, png}), and edge support sum of absolute errors (essae.{eps, png}) figures. The analysis results contain the necessary files to execute analyze_dkpar.R and are available at [Google Drive] and are most manageable in terms of file size.

MCMC Experiments

The experiments using Markov Chain Monte Carlo (MCMC) to approximate the structure posterior are provided in Section 6. The procedure for reproducing these results is similar to that of the main experiments. The results are available at [Google Drive].

source("run_dkpar2.R")
source("compile_dkpar2.R")
source("analyze_dkpar2.R")
source("generate_child.R")
source("run_child.R")
source("compile_child.R")
source("analyze_child.R")

Additional Experiments

The additional experiments evaluating the proposed backdoor adjustment methodology are provided in Appendix D.

source("test_bda.R")
source("analyze_bda.R")

test_bda.R creates folders test_bda-{d, g} nested in a folder test_bda, investigating 24000 randomly generated scenarios for each distributional setting. In each folder, an RDS file is compiled containing the simulation results. The script executed in about 167 days of single thread CPU time using the UCLA Hoffman2 computing cluster. analyze_bda.R creates the backdoor adjustment coverage probabilities for the discrete (bda_coverage-d.{eps, png}) and Gaussian (bda_coverage-g.{eps, png}) simulations. The results are available at [Google Drive].



jirehhuang/bcb documentation built on Feb. 5, 2024, 10:16 p.m.