doBuild: Build or rebuild the Bayes nets for a scoring engine.

doBuildR Documentation

Build or rebuild the Bayes nets for a scoring engine.

Description

This function downloads the table specifications from the internet and rebuilds the Bayesian networks for a partilar scoring application. It takes the information from the “tables” subdirectory (under config.dir and builds the nets in the “nets” subdirectory.

Usage

doBuild(sess, EA.tables, config.dir, override = FALSE)

Arguments

sess

A NeticaSession object used to build the Bayes nets.

EA.tables

A list containing configuration details. See the ‘Configuration’ section below.

config.dir

The pathname of the directory that contains the tables and the nets subdirectories.

override

A logical flag. If true, the code will ignore locks and rebuild the nets anyway.

Details

This program applies the scripts from the Peanut-package to rebuild the nets. It assumes the existance of five tables which describe the scoring model:

Nets.csv

Manifest of all networks. See Warehouse and BNWarehouse.

Nodes.csv

Manifest of all nodes in all networks. See Warehouse and NNWarehouse.

Omega.csv

Description of the competency model. See Omega2Pnet.

Q.csv

Description of the evidence model. See Qmat2Pnet.

Statistics.csv

A description of the statistics being used. See configStats.

These are expected to reside is the “tables” subdirectory of the config.dir and have the names described above (although these details can be overriden by the configuration, see ‘Configuration’ below).

The following steps are followed in the rebuilding.

  1. The tables (CSV files) are downloaded from internet sources (see Downloading Tables below) into the “tables” directory.

  2. The tables are loaded into R and a PnetWarehouse and PnodeWarehouse are built for the models.

  3. The Omega2Pnet script is run to build the proficiency model.

  4. The Qmat2Pnet script is run to build the evidence models.

  5. The nets are written out the “nets” subdirectory of config.dir. The net manifest is written to the subdirectory in the file “NetManifest.csv” and the statistic list is written in the file “StatisticList.csv”. These values can be overrided with the configuration.

Value

This function is invoked for its side effects, which are stored in the “nets” subdirectory of the config.dir directory.

Configuration

There are a large number of parameters which can be configured. These are passed in through the EA.tables argument, which is a list of parameters. The intention is that this can be read in from a JSON file (using fromJSON). In the current implementation, the EA.tables parameter set is a sub-object of the larger EA.config parameter set.

The following fields are available:

netdir

This is the name of the subdirectory of config.dir in which the constructed nets will be saved. Default value is “nets”.

tabdir

This is the name of the subdirectory of config.dir in which the network specification tables are found. The default value is “tables”.

TableID

This is a parameter passed to the download script to identify the place from which the tables should be downloaded. The intent is for this to be a Google Sheets ID such as, “16LcEuCspZjiBoZ3-Y1R3jxi1COXmh9vuTa9GwH1A_7Q”.

downloadScript

This is the name of the script which is run to download the tables. The default value is “download.sh”. See the Downloading Tables section below.

NetsName

This is the name (less the .csv extension) of the file containing the network manifest. The default value is “Nets”.

NodesName

This is the name (less the .csv extension) of the file containing the node manifest. The default value is “Nodes”.

OmegaName

This is the name (less the .csv extension) of the file containing the Omega matrix (Proficiency model specification). The default value is “Omega”.

QName

This is the name (less the .csv extension) of the file containing the Q matrix (Evidence model specification). The default value is “Q”.

StatName

This is the name (less the .csv extension) of the file containing the statistic list. The default value is “Statistics”.

profModel

This is the name of the proficiency model. If no value is supplied, the value is inferred from the first non-missing value of the “Hub” column in the network manifest.

manifestFile

The name of the file in which the list of available networks is output. The default value is “PPManifest.csv”.

statFile

The name of the file (in the “nets” directory) in which statistics list is output. The default value is “StatisticList.csv”.

Downloading Tables

The complete specification is given in five different tables. This can be represented a five different sheets (pages) on a typical spreadsheet program. In various projects it has been useful to create a Google Sheets document with these five pages which can be accessed by the project team. Thus, one team member can make changes and the other download it. (This would probably work with a different document collaboration system, but this has not been tested.)

Google Sheets are identified by a long string in the URL. This is the “TableID” field in the EA.tables configuration list. (In theory, this could be replaced by an appropriate identifier if something other than Google Sheets was used.) The script “download.sh” (the name can be overriden in the configuration) is called using system2 with the “table” directory path and the “TableID” as arguments. It then downloads the tables.

The bash implementation for use with Google sheets is to first define a BASEURL variable: BASEURL="https://docs.google.com/spreadsheets/d/$2", and then to call curl to download the sheets, e.g., curl "${BASEURL}/gviz/tq?tqx=out:csv&sheet={Nets}" >Nets.csv.

In theory, the sheets could be downloaded directly from the URLs using read.csv, however, there were issues with that solution. This solution also allows the download.sh script to take care of any authentication which needs to be done (as the Google APIs here are a moving target).

Locking

It is probably a bad idea to rebuild the nets which a different incarnation is using the net directory to score. It is almost certainly a bad idea for two different programs to rebuild the nets in the same directory at the same time.

To prevent such clashes, the doRunrun function adds a file with the extension .lock to the directory when it is scoring. The doBuild function adds the file netbuilder.lock while it is rebuilding the nets.

If when doBuild starts, if a .lock file is found in the “nets” directory, it issues an warning, and unless the override parameter is set to TRUE it stops. Use the override only with extreme caution.

Logging

Logging is done through the futile.logger{flog.logger} mechanism. This allows logs to be save to a file.

Author(s)

Russell Almond

References

Almond, R. G. (2010). ‘I can name that Bayesian network in two matrixes.’ International Journal of Approximate Reasoning. 51, 167-178.

Almond, R. G. (presented 2017, August). Tabular views of Bayesian networks. In John-Mark Agosta and Tomas Singlair (Chair), Bayeisan Modeling Application Workshop 2017. Symposium conducted at the meeting of Association for Uncertainty in Artificial Intelligence, Sydney, Australia. (International) Retrieved from http://bmaw2017.azurewebsites.net/

See Also

doRunrun, configStats

Warehouse, BNWarehouse, NNWarehouse, Omega2Pnet, Qmat2Pnet,

Examples

## This example is in:
file.path(help(package="EABN")$path,"conf","EABuild.R")
## Not run: 
## Set up config.dir, logpath and  NeticaLicenseKey
source("/usr/local/share/Proc4/EAini.R")

EA.config <- jsonlite::fromJSON(file.path(config.dir,"config.json"),FALSE)

EA.tables <- EA.config$Tables
EA.tables$netdir <- EA.config$netdir

sess <- RNetica::NeticaSession(LicenseKey=NeticaLicenseKey)
startSession(sess)

futile.logger::flog.appender(appender.file(file.path(logpath,
                            sub("<app>","builder",EA.config$logname))))
futile.logger::flog.threshold(EA.config$loglevel)

doBuild(sess,EA.tables,config.dir)

## End(Not run)

ralmond/EABN documentation built on Aug. 30, 2023, 12:52 p.m.