GEMfit: Fits the parameters of a parameterized network using the GEM...
In ralmond/Peanut: Parameterized Bayesian Networks, Abstract Classes

GEMfit

R Documentation

Fits the parameters of a parameterized network using the GEM algorithm

Description

A Pnet is a description of a parameterized Bayesian network, with each Pnode giving the parameterization for its conditional probability table. This function uses a generalized EM algorithm to find the values of the parameters for each Pnode which maximize the posterior probability of the data in cases.

Usage

GEMfit(net, cases, tol = sqrt(.Machine$double.eps),
       maxit = 100, Estepit = 1, Mstepit = 30,
       trace=FALSE, debugNo=maxit+1)

Arguments

`net`	A `Pnet` object
`cases`	An object representing a set of cases. Note the type of object is implementation dependent. It could be either a data frame providing cases or a filename for a case file.
`tol`	A numeric scalar giving the stopping tolerance for the for the EM algorithm.
`maxit`	An integer scalar giving the maximum number of iterations for the outer EM algorithm.
`Estepit`	An integer scalar giving the number of steps the Bayes net package should take in the internal EM algorithm during the E-step.
`Mstepit`	An integer scalar giving the number of steps that should be taken by `mapDPC` during the M-step.
`trace`	A logical value which indicates whether or not cycle by cycle information should be sent to to the `flog.logger`.
`debugNo`	An integer scalar. When this iteration is reached, then the `flog.threshold(DEBUG)` will be set, so more debugging information will be provided.

Details

The GEMfit algorithm uses a generalized EM algorithm to fit the parameterized network to the given data. This loops over the following steps:

E-step: Run the internal EM algorithm of the Bayes net package to calculate expected tables for all of the tables being learned. The function calcExpTables carries out this step.
M-step: Find a set of table parameters which maximize the fit to the expected counts by calling mapDPC for each table. The function maxAllTableParams does this step.
Update CPTs: Set all the conditional probability tables in the network to the new parameter values. The function BuildAllTables does this.
Convergence Test: Calculate the log likelihood of the cases under the new parameters and stop if no change. The function calcPnetLLike calculates the log likelihood.

Note that although GEMfit is not a generic function, the four main component functions, calcExpTables, maxAllTableParams, BuildAllTables, and calcPnetLLike, are generic functions. In particular, the cases argument is passed to calcExpTables and calcPnetLLike and must be whatever the host Bayes net package regards as a collection of cases. In PNetica-package the cases argument should be a filename of a Netica case file (see write.CaseFile).

The parameter tol controls the convergence checking. In particular, the algorithm stops when the difference in log-likelihood (as computed by calcPnetLLike) between iterations is less than tol in absolute value. If the number of iterations exceeds maxit the algorithm will stop and report lack of convergence.

The E-step and the M-step are also both iterative; the parameters Estepit and Mstepit control the number of iterations taken in each step respectively. As the goal of the E-step is to calculate the expected tables of counts, the default value of 1 should be fine. Although the algorithm should eventually converge for any value of Mstepit, different values may affect the convergence rate, and analysts may need to experiment with application specific values of this parameter.

The arguments trace and debugNo are used to provide extra debugging information. Setting trace to TRUE means that a message is printed after tables are built but before they are updated. Setting debugNo to a certain integer, will begin node-by-node messages for both BuildAllTables and maxAllTableParams. In particular, setting it to 1 is useful for debugging problems that occur at initialization. If the problem turns up at a later cycle, the trace option can be used to figure out when the error occurs.

Value

A list with three elements:

`converged`	A logical flag indicating whether or not the algorithm reached convergence.
`iter`	An integer scalar giving the number of iterations of the outer EM loop taken by the algorithm (plus 1 for the starting point).
`llikes`	A numeric vector of length `iter` giving the values of the log-likelihood after each iteration. (The first value is the initial log likelihood.)

As a side effect the PnodeLnAlphas and PnodeBetas fields of all nodes in PnetPnodes(net)) are updated to better fit the expected tables, and the internal conditional probability tables are updated to match the new parameter values.

Logging and Debug Mode

As of version 0.6-2, the meaning of the trace and debugNo has changed. In the new version, the flog.logger mechanism is used for progress reports, and error reporting.

Setting trace to true causes information about the steps of the algoritm (incluing the log likelihood at each step) to be output to the current appender (see flog.appender) The logging is done at the INFO level. As the default appender is the console, and INFO is the default logging level, the meaning of this parameter hasn't changed much.

The meaning of debugNo has changed, howver. Previously, it would turn on extra debug information when the target iteration was reached. That information is now always logged at the DEBUG level. So now if the current iteration reached debugNo, then GEMfit calls flog.threshold(DEBUG) to provide more information. This allows the more detailed DEBUG-level messages to be turned on when the EM algorithm is closer to convergence.

Note

Note that although this is not a generic function, the four main component functions: calcExpTables, maxAllTableParams, BuildAllTables, and calcPnetLLike. All four must have specific implementations for this function to work. See the PNetica-package for an example.

These functions assume that the host Bayes net implementation (e.g., RNetica-package): (1) net has an EM learning function, (2) the EM learning supports hyper-Dirichlet priors, (3) it is possible to recover the hyper-Dirichlet posteriors after running the internal EM algorithm.

Author(s)

Russell Almond

References

Almond, R. G. (2015) An IRT-based Parameterization for Conditional Probability Tables. Paper presented at the 2015 Bayesian Application Workshop at the Uncertainty in Artificial Intelligence Conference.

Examples


## Not run: 

library(PNetica) ## Need a specific implementation
sess <- NeticaSession()
startSession(sess)

irt10.base <- ReadNetworks(system.file(
                           "testnets","IRT10.2PL.base.dne",
                           package="PNetica"),
                           session=sess)
irt10.base <- as.Pnet(irt10.base)  ## Flag as Pnet, fields already set.
irt10.theta <- PnetFindNode(irt10.base,"theta")
irt10.items <- PnetPnodes(irt10.base)
## Flag items as Pnodes
for (i in 1:length(irt10.items)) {
  irt10.items[[i]] <- as.Pnode(irt10.items[[i]])
  ## Add node to list of observed nodes
  PnodeLabels(irt10.items[[1]]) <-
     union(PnodeLabels(irt10.items[[1]]),"onodes")
}


casepath <- system.file("testdat", "IRT10.2PL.200.items.cas", package="PNetica")


BuildAllTables(irt10.base)
PnetCompile(irt10.base) ## Netica requirement

item1 <- irt10.items[[1]]
priB <- PnodeBetas(item1)
priA <- PnodeAlphas(item1)
priCPT <- PnodeProbs(item1)

gemout <- GEMfit(irt10.base,casepath,trace=TRUE)

postB <- PnodeBetas(item1)
postA <- PnodeAlphas(item1)
postCPT <- PnodeProbs(item1)

## Posterior should be different
stopifnot(
  postB != priB, postA != priA
)


### The network that was used for data generation.
irt10.true <- ReadNetworks(system.file(
                           "testnets", "IRT10.2PL.true.dne",
                           package="PNetica"),
                           session=sess)
irt10.true <- as.Pnet(irt10.true)  ## Flag as Pnet, fields already set.
irt10.ttheta <- PnetFindNode(irt10.true,"theta")
irt10.titems <- PnetPnodes(irt10.true)
## Flag titems as Pnodes
for (i in 1:length(irt10.titems)) {
  irt10.titems[[i]] <- as.Pnode(irt10.titems[[i]])
  ## Add node to list of observed nodes
  PnodeLabels(irt10.titems[[1]]) <-
     union(PnodeLabels(irt10.titems[[1]]),"onodes")
}


BuildAllTables(irt10.true)
PnetCompile(irt10.true) ## Netica requirement

## See how close we are.
for (j in 1:length(irt10.titems)) {
  cat("diff[",j,"] = ",
      sum(abs(PnodeProbs(irt10.items[[j]])-
              PnodeProbs(irt10.titems[[j]])))/
      length(PnodeProbs(irt10.items[[j]])), "\n")
}


DeleteNetwork(irt10.base)
DeleteNetwork(irt10.true)


## End(Not run)

ralmond/Peanut documentation built on Sept. 19, 2023, 8:27 a.m.