bnsl: Bayesian network structure learning

View source: R/bnsl.R

bnslR Documentation

Bayesian network structure learning

Description

Wrapper function for executing general structure learning algorithms, much like rsmax2 from the bnlearn package, with the addition of the partitioned PC (pPC) algorithm, the p-value adjacency thresholding (PATH) algorithm, and the hybrid greedy initialization (HGI) algorithm.

Usage

bnsl(
  x,
  restrict = "ppc",
  maximize = "tabu",
  restrict.args = list(),
  maximize.args = list(),
  undirected = FALSE,
  path = 1,
  min_alpha = 1e-05,
  hgi = FALSE,
  true_bn = NULL,
  whitelist = NULL,
  blacklist = NULL,
  debug = FALSE
)

Arguments

x

a data frame containing the variables in the model. Currently, the implementations of pPC, PATH, and HGI only support discrete data.

restrict

an argument as in rsmax2, with the following additional options.

  • restrict = "ppc" for the pPC algorithm (ppc).

  • restrict = "true" with true_bn supplied to perfectly restrict to the true skeleton.

  • restrict = "cig" with true_bn supplied to perfectly restrict to the conditional independence graph (CIG).

  • restrict = "" for no restriction method (i.e., for score-based methods).

maximize

an argument as in rsmax2, with the additional option of maximize = "" for no maximization method (i.e., for constraint-based methods).

restrict.args

an argument as in rsmax2, with the addition of the following arguments only applicable to restrict = "ppc" (ppc).

  • max_groups = 20: a numeric value indicating maximum number of groups to partition into. The PC(-stable) algorithm may be recovered with max_groups = 1, allowing for the sort_pval argument.

  • sort_pval = TRUE: a logical value indicating whether or not to sort the order of the consideration of conditioning sets by the current p-values.

  • max_wthn_sx: a numeric value indicating the maximum size of considered conditioning sets when estimating edges within clusters.

  • max_btwn_sx: a numeric value indicating the maximum size of considered conditioning sets when estimating edges between clusters.

  • max_btwn_nbr: a numeric value indicating the maximum neighborhood size when estimating edges between clusters.

  • maxp: a numeric value indicating the maximum number of parents for a node, as in hc and tabu. Useful when hgi = TRUE. Should be less than or equal to maximize.args$maxp.

Additionally, ppc only uses test = "mi" for the clustering step, though subsequent stages of the algorithm can employ different tests.

maximize.args

an argument as in rsmax2, with the maxp argument likewise used constraint-based edge orientation, if applicable. Additionally, the score argument determines the score used in PATH (score = "pred-loglik" not supported), and HGI exclusively uses score = "bic".

undirected

a logical value indicating, for constraint-based algorithms, whether the skeleton should be output without learning edge orientations. Not applicable for hybrid methods, score-based methods, or when path or hgi are activated.

path

a numeric value indicating the number of solution(s) to be generated by the PATH algorithm. By default, PATH remains deactivated with path = 1. Only applicable for restrict = "ppc".

min_alpha

a numeric value between 0 and restrict.args$alpha that indicates the minimum threshold value for the PATH algorithm.

hgi

a logical value activating the HGI algorithm for greedy edge orientation. Not applicable if restrict = "".

true_bn

a bn object with the true underlying structure of x to evaluate d-separation tests instead of conditional independence tests. Only applicable for restrict = "ppc" or restrict = "true".

whitelist, blacklist, debug

arguments as in rsmax2.

Details

None.

Value

A Bayesian network as an object of class bn.

Author(s)

Jireh Huang (jirehhuang@ucla.edu)

References

Huang, J., & Zhou, Q. (2022). Partitioned hybrid learning of Bayesian network structures. Machine Learning. https://doi.org/10.1007/s10994-022-06145-4

See Also

ppc, phgs

Examples

## Read Bayesian network object 
true_bn <- bnrepository("child")

## Generate data and relevel for simplicity
set.seed(1)
x <- bnlearn::rbn(true_bn, n = 1e4)
x <- as.data.frame(sapply(x, function(x) as.factor(as.integer(x) - 1L)),
                   stringsAsFactors = TRUE)

## pPC with PATH
bn1 <- bnsl(x = x, restrict = "ppc", maximize = "",
            restrict.args = list(alpha = 1e-3, max.sx = 3, sort_pval = TRUE),
            maximize.args = list(maxp = 8), path = 10, min_alpha = 1e-5,
            hgi = FALSE, debug = TRUE)

## pHGS (pPC with PATH, HGI, and greedy search)
bn2 <- bnsl(x = x, restrict = "ppc", maximize = "tabu",
            restrict.args = list(alpha = 1e-3, max.sx = 3, sort_pval = TRUE),
            maximize.args = list(maxp = 8, tabu = 10, max.tabu = 10),
            path = 10, min_alpha = 1e-5, hgi = TRUE, debug = TRUE)

## MMHC with HGI
bn3 <- bnsl(x = x, restrict = "mmpc", maximize = "hc",
            restrict.args = list(alpha = 1e-3, max.sx = 3, sort_pval = TRUE),
            maximize.args = list(maxp = 8), hgi = TRUE, debug = TRUE)

jirehhuang/phsl documentation built on May 23, 2022, 4:19 a.m.