preprocessing: Pre-process data to better learn Bayesian networks
In bnlearn: Bayesian Network Structure Learning, Parameter Learning and Inference

data preprocessing

R Documentation

Pre-process data to better learn Bayesian networks

Description

Screen and transform the data to make them more suitable for structure and parameter learning.

Usage

  # discretise continuous data into factors.
  discretize(data, method, breaks = 3, ordered = FALSE, ..., debug = FALSE)
  # screen continuous data for highly correlated pairs of variables.
  dedup(data, method, threshold, debug = FALSE)

Arguments

`data`	a data frame containing numeric columns (for `dedup()`) or a combination of numeric or factor columns (for `discretize()`).
`threshold`	a numeric value between zero and one, the absolute correlation used as a threshold in screening highly correlated pairs.
`method`	a character string, the label of the method used to preprocess the data. For `discretize()`, the possible values are `"interval"` for interval discretisation, `"quantile"` for quantile discretisation (the default) or `"hartemink"` for Hartemink's pairwise mutual information method. For `dedup()`, the only possible value is `"cor"` for screening based on linear correlation.
`breaks`	an integer number, the number of levels the variables will be discretised into; or a vector of integer numbers, one for each column of the data set, specifying the number of levels for each variable.
`ordered`	a boolean value. If `TRUE`, the discretised variables are returned as ordered factors instead of unordered ones.
`...`	additional tuning parameters, see below.
`debug`	a boolean value. If `TRUE`, a lot of debugging output is printed. Otherwise, the function is completely silent.

Details

discretize() takes a data frame as its first argument and returns a second data frame of discrete variables, transformed using one of three methods: "interval", "quantile" or "hartemink". Discrete variables are left unchanged.

The "hartemink" method has two additional tuning parameters:

idisc: the method used for the initial marginal discretisation of the variables, either "interval" or "quantile".
ibreaks: the number of levels the variables are initially discretised into, in the same format as in the breaks argument.

It is sometimes the case that the "quantile" method cannot discretise one or more variables in the data without generating zero-length intervals because the quantiles are not unique. If method = "quantile", discretize() will produce an error. If method = "quantile" and idisc = "quantile", discretize() will try to lower the number of breaks set by the ibreaks argument until quantiles are distinct. If this is not possible without making ibreaks smaller than breaks, discretize() will return an error.

dedup() screens the data for pairs of highly correlated variables, and discards one in each pair.

Both discretize() and dedup() accept data with missing values.

Value

discretize() returns a data frame with the same structure (number of columns, column names, etc.) as data, containing the discretised variables. The data frame has an attribute "cutpoints" containing the cutpoints used internally by discretize(), which are made available so that discretising another data set yields the same set of factors.

dedup() returns a data frame with a subset of the columns of data.

Author(s)

Marco Scutari

References

Hartemink A (2001). Principled Computational Methods for the Validation and Discovery of Genetic Regulatory Networks. Ph.D. thesis, School of Electrical Engineering and Computer Science, Massachusetts Institute of Technology.

Examples

data(gaussian.test)
d = discretize(gaussian.test, method = 'hartemink', breaks = 4, ibreaks = 10)
plot(hc(d))
d2 = dedup(gaussian.test)

bnlearn documentation built on July 17, 2026, 5:08 p.m.

bnlearn index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bnlearn
Bayesian Network Structure Learning, Parameter Learning and Inference

preprocessing: Pre-process data to better learn Bayesian networks
In bnlearn: Bayesian Network Structure Learning, Parameter Learning and Inference

Pre-process data to better learn Bayesian networks

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to preprocessing in bnlearn...

R Package Documentation

Browse R Packages

We want your feedback!

bnlearn Bayesian Network Structure Learning, Parameter Learning and Inference

preprocessing: Pre-process data to better learn Bayesian networks In bnlearn: Bayesian Network Structure Learning, Parameter Learning and Inference

Pre-process data to better learn Bayesian networks

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to preprocessing in bnlearn...

R Package Documentation

Browse R Packages

We want your feedback!

bnlearn
Bayesian Network Structure Learning, Parameter Learning and Inference

preprocessing: Pre-process data to better learn Bayesian networks
In bnlearn: Bayesian Network Structure Learning, Parameter Learning and Inference