Contains the all of the data that can be extracted from a given dataset: raw data, imputed data, raw and imputed data with bootstrap.
1 2 3 4
an empty BNDataset.
raw data.frame or path/name of the file containing the raw dataset (see 'Details').
a vector of booleans indicating if the variables are discrete or continuous
vector of variable names.
vector of variable cardinalities (for discrete variables) or quantization ranges (for continuous variables).
further arguments for reading a dataset from files (see documentation for
There are two ways to build a BNDataset: using two files containing respectively header informations and data, and manually providing the data table and the related header informations (variable names, cardinality and discreteness).
The key informations needed are: 1. the data; 2. the state of variables (discrete or continuous); 3. the names of the variables; 4. the cardinalities of the variables (if discrete), or the number of levels they have to be quantized into (if continuous). Names and cardinalities/leves can be guessed by looking at the data, but it is strongly advised to provide _all_ of the informations, in order to avoid problems later on during the execution.
Data can be provided in form of data.frame or matrix. It can contain NAs. By default, NAs are indicated with '?';
to specify a different character for NAs, it is possible to provide also the
The values contained in the data have to be numeric (real for continuous variables, integer for discrete ones).
The default range of values for a discrete variable
the cardinality of
X. The same applies for the levels of quantization for continuous variables.
If the value ranges for the data are different from the expected ones, it is possible to specify a different
starting value (for the whole dataset) with the
starts.from parameter. E.g. by
we assume that the values of the variables in the dataset have range
Please keep in mind that the internal representation of bnstruct starts from 1,
and the original starting values are then lost.
It is possible to use two files, one for the data and one for the metadata,
instead of providing manually all of the info.
bnstruct requires the data files to be in a format subsequently described.
The actual data has to be in (a text file containing data in) tabular format, one tuple per row,
with the values for each variable separated by a space or a tab. Values for each variable have to be
numbers, starting from
1 in case of discrete variables.
Data files can have a first row containing the names of the corresponding variables.
In addition to the data file, a header file containing additional informations can also be provided.
An header file has to be composed by three rows of tab-delimited values:
1. list of names of the variables, in the same order of the data file;
2. a list of integers representing the cardinality of the variables, in case of discrete variables,
or the number of levels each variable has to be quantized in, in case of continuous variables;
3. a list that indicates, for each variable, if the variable is continuous
C), and thus has to be quantized before learning,
or discrete (
In case of need of more advanced options when reading a dataset from files, please refer to the
documentation of the
read.dataset method. Imputation and bootstrap are also available
as separate routines (
In case of an evolving system to be modeled as a Dynamic Bayesian Network, it is possible to specify
only the description of the variables of a single instant; the information will be replicated for all
num.time.steps instants that compose the dataset, where
num.time.steps needs to be
set as parameter. In this case, it is assumed that the N variables v1, v2, ..., vN of a single instant
appear in the dataset as v1_t1, v2_t1, ..., vN_t1, v1_t2, v2_t2, ..., in this exact order.
The user can however provide information for all the variables in all the instants; if it is not the case,
the name of the variables will be edited to include the instant. In case of an evolving system, the
num.variables slots refers anyway to the total number of variables observed in all the instants
(the number of columns in the dataset), and not to a single instant.
a BNDataset object.
name of the dataset
name and location of the header file
name and location of the data file
names of the variables in the network
cardinality of each variable of the network
number of variables (columns) in the dataset
TRUE if variable is discrete,
FALSE if variable is continue
list of vectors containing the quantiles, one vector per variable. Each vector is
NULL if the variable is discrete, and contains the quantiles if it is continuous
number of observations (rows) in the dataset
TRUE if the dataset contains data read from a file
TRUE if the dataset contains imputed data (computed from raw data)
matrix containing raw data
matrix containing imputed data
dataset has bootstrap samples
list of bootstrap samples
dataset has imputed bootstrap samples
list of imputed bootstrap samples
number of bootstrap samples
number of instants in which the network is observed (1, unless it is a dynamic system)
read.dataset, impute, bootstrap
1 2 3 4 5 6 7 8 9 10 11 12
## Not run: # create from files dataset <- BNDataset("file.data", "file.header") # other way: create from raw dataset and metadata data <- matrix(c(1:16), nrow = 4, ncol = 4) dataset <- BNDataset(data = data, discreteness = rep('d',4), variables = c("a", "b", "c", "d"), node.sizes = c(4,8,12,16)) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.