doBuild | R Documentation |
This function downloads the table specifications from the internet and
rebuilds the Bayesian networks for a partilar scoring application. It
takes the information from the “tables” subdirectory (under
config.dir
and builds the nets in the “nets”
subdirectory.
doBuild(sess, EA.tables, config.dir, override = FALSE)
sess |
A |
EA.tables |
A list containing configuration details. See the ‘Configuration’ section below. |
config.dir |
The pathname of the directory that contains the tables and the nets subdirectories. |
override |
A logical flag. If true, the code will ignore locks and rebuild the nets anyway. |
This program applies the scripts from the Peanut-package
to rebuild the nets. It assumes the existance of five tables which
describe the scoring model:
Manifest of all networks. See
Warehouse
and BNWarehouse
.
Manifest of all nodes in all networks. See
Warehouse
and NNWarehouse
.
Description of the competency model. See
Omega2Pnet
.
Description of the evidence model. See
Qmat2Pnet
.
A description of the statistics being used. See
configStats
.
These are expected to reside is the “tables” subdirectory of
the config.dir
and have the names described above (although
these details can be overriden by the configuration, see
‘Configuration’ below).
The following steps are followed in the rebuilding.
The tables (CSV files) are downloaded from internet sources (see Downloading Tables below) into the “tables” directory.
The tables are loaded into R and a
PnetWarehouse
and
PnodeWarehouse
are built for the models.
The Omega2Pnet
script is run to build
the proficiency model.
The Qmat2Pnet
script is run to build the
evidence models.
The nets are written out the “nets” subdirectory of
config.dir
. The net manifest is written to the
subdirectory in the file “NetManifest.csv” and the
statistic list is written in the file
“StatisticList.csv”. These values can be overrided with the
configuration.
This function is invoked for its side effects, which are stored in the
“nets” subdirectory of the config.dir
directory.
There are a large number of parameters which can be configured. These
are passed in through the EA.tables
argument, which is a list
of parameters. The intention is that this can be read in from a JSON
file (using fromJSON
). In the current
implementation, the EA.tables
parameter set is a sub-object of
the larger EA.config
parameter set.
The following fields are available:
This is the name of the subdirectory of
config.dir
in which the constructed nets will be saved.
Default value is “nets”.
This is the name of the subdirectory of
config.dir
in which the network specification tables are
found. The default value is “tables”.
This is a parameter passed to the download script to identify the place from which the tables should be downloaded. The intent is for this to be a Google Sheets ID such as, “16LcEuCspZjiBoZ3-Y1R3jxi1COXmh9vuTa9GwH1A_7Q”.
This is the name of the script which is run to download the tables. The default value is “download.sh”. See the Downloading Tables section below.
This is the name (less the .csv extension) of the file containing the network manifest. The default value is “Nets”.
This is the name (less the .csv extension) of the file containing the node manifest. The default value is “Nodes”.
This is the name (less the .csv extension) of the file containing the Omega matrix (Proficiency model specification). The default value is “Omega”.
This is the name (less the .csv extension) of the file containing the Q matrix (Evidence model specification). The default value is “Q”.
This is the name (less the .csv extension) of the file containing the statistic list. The default value is “Statistics”.
This is the name of the proficiency model. If no value is supplied, the value is inferred from the first non-missing value of the “Hub” column in the network manifest.
The name of the file in which the list of available networks is output. The default value is “PPManifest.csv”.
The name of the file (in the “nets” directory) in which statistics list is output. The default value is “StatisticList.csv”.
The complete specification is given in five different tables. This can be represented a five different sheets (pages) on a typical spreadsheet program. In various projects it has been useful to create a Google Sheets document with these five pages which can be accessed by the project team. Thus, one team member can make changes and the other download it. (This would probably work with a different document collaboration system, but this has not been tested.)
Google Sheets are identified by a long string in the URL. This is the
“TableID” field in the EA.tables
configuration list.
(In theory, this could be replaced by an appropriate identifier if
something other than Google Sheets was used.) The script
“download.sh” (the name can be overriden in the configuration)
is called using system2
with the “table”
directory path and the “TableID” as arguments. It then
downloads the tables.
The bash implementation for use with Google sheets is to first define
a BASEURL
variable:
BASEURL="https://docs.google.com/spreadsheets/d/$2"
, and then
to call curl
to download the sheets, e.g.,
curl "${BASEURL}/gviz/tq?tqx=out:csv&sheet={Nets}" >Nets.csv
.
In theory, the sheets could be downloaded directly from the URLs using
read.csv
, however, there were issues with that solution. This
solution also allows the download.sh
script to take care of any
authentication which needs to be done (as the Google APIs here are a
moving target).
It is probably a bad idea to rebuild the nets which a different incarnation is using the net directory to score. It is almost certainly a bad idea for two different programs to rebuild the nets in the same directory at the same time.
To prevent such clashes, the doRunrun
function adds a
file with the extension .lock
to the directory when it is
scoring. The doBuild
function adds the file
netbuilder.lock
while it is rebuilding the nets.
If when doBuild
starts, if a .lock
file is found in the
“nets” directory, it issues an warning, and unless the
override
parameter is set to TRUE
it stops. Use the
override only with extreme caution.
Logging is done through the futile.logger{flog.logger}
mechanism. This allows logs to be save to a file.
Russell Almond
Almond, R. G. (2010). ‘I can name that Bayesian network in two matrixes.’ International Journal of Approximate Reasoning. 51, 167-178.
Almond, R. G. (presented 2017, August). Tabular views of Bayesian networks. In John-Mark Agosta and Tomas Singlair (Chair), Bayeisan Modeling Application Workshop 2017. Symposium conducted at the meeting of Association for Uncertainty in Artificial Intelligence, Sydney, Australia. (International) Retrieved from http://bmaw2017.azurewebsites.net/
doRunrun
, configStats
Warehouse
,
BNWarehouse
,
NNWarehouse
,
Omega2Pnet
,
Qmat2Pnet
,
## This example is in:
file.path(help(package="EABN")$path,"conf","EABuild.R")
## Not run:
## Set up config.dir, logpath and NeticaLicenseKey
source("/usr/local/share/Proc4/EAini.R")
EA.config <- jsonlite::fromJSON(file.path(config.dir,"config.json"),FALSE)
EA.tables <- EA.config$Tables
EA.tables$netdir <- EA.config$netdir
sess <- RNetica::NeticaSession(LicenseKey=NeticaLicenseKey)
startSession(sess)
futile.logger::flog.appender(appender.file(file.path(logpath,
sub("<app>","builder",EA.config$logname))))
futile.logger::flog.threshold(EA.config$loglevel)
doBuild(sess,EA.tables,config.dir)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.