maxAllTableParams: Find optimal parameters of Pnet or Pnode to match expected...
In ralmond/Peanut: Parameterized Bayesian Networks, Abstract Classes

maxAllTableParams

R Documentation

Find optimal parameters of Pnet or Pnode to match expected tables

Description

These functions assume that an expected count contingency table can be built from the network. They then try to find the set of parameters maximizes the probability of the expected contingency table with repeated calls to mapDPC. The function maxCPTParam maximizes a single Pnode and the function maxAllTableParams maximizes all Pnodes (i.e., the value of PnetPnodes(net) in a Pnet.

Usage

maxAllTableParams(net, Mstepit = 5, tol = sqrt(.Machine$double.eps), debug=FALSE)
maxCPTParam(node, Mstepit = 5, tol = sqrt(.Machine$double.eps))

Arguments

`net`	A `Pnet` object giving the parameterized network.
`node`	A `Pnode` object giving the parameterized node.
`Mstepit`	A numeric scalar giving the number of maximization steps to take. Note that the maximization does not need to be run to convergence.
`tol`	A numeric scalar giving the stopping tolerance for the maximizer.
`debug`	A logical scalar. If true then `recover` is called after an error, so that the node in question can be inspected.

Details

The GEMfit algorithm uses a generalized EM algorithm to fit the parameterized network to the given data. This loops over the following steps:

E-step: Run the internal EM algorithm of the Bayes net package to calculate expected tables for all of the tables being learned. The function calcExpTables carries out this step.
M-step: Find a set of table parameters which maximize the fit to the expected counts by calling mapDPC for each table. The function maxAllTableParams does this step.
Update CPTs: Set all the conditional probability tables in the network to the new parameter values. The function BuildAllTables does this.
Convergence Test: Calculate the log likelihood of the cases under the new parameters and stop if no change. The function calcPnetLLike calculates the log likelihood.

The function maxAllTableParams performs the M-step of this operation. Under the global parameter independence assumption, the parameters for the conditional probability tables for different nodes are independent given the sufficient statistics; that is, the expected contingency tables. The default method of maxAllTableParams calls maxCPTParam on each node in PnetPnodes(net).

After the hyper-Dirichlet EM algorithm is run by calcExpTables, a hyper-Dirichlet prior should be available for each conditional probability table. As the parameter of the Dirichlet distribution is a vector of pseudo-counts, the output of this algorithm should be a table of pseudo counts. Often this is stored as the updated conditional probability table and a vector of row weights indicating the strength of information for each row. Using the RNetica-package, this is calculated as: sweep(NodeProbs(item1),1, NodeExperience(item1),"*")

The function maxCPTParm is essentially a wrapper which extracts the table of pseudo-counts from the network and then calls mapDPC to maximize the parameters, updating the parameters of node to the result.

The parameters Mstepit and tol are passed to mapDPC to control the gradient descent algorithm used for maximization. Note that for a generalized EM algorithm, the M-step does not need to be run to convergence, a couple of iterations are sufficient. The value of Mstepit may influence the speed of convergence, so the optimal value may vary by application. The tolerance is largely irrelevant (if Mstepit is small) as the outer EM algorithm does the tolerance test.

Value

The expression maxCPTParam(node) returns node invisibly. The expression maxAllTableParams(net) returns net invisibly.

As a side effect the PnodeLnAlphas and PnodeBetas fields of node (or all nodes in PnetPnodes(net)) are updated to better fit the expected tables.

Logging and Debug Mode

As of version 0.6-2, the meaning of the debug argument is changed. In the new version, the flog.logger mechanism is used for progress reports, and error reporting. In particular, setting flog.threshold(DEBUG) (or TRACE) will cause progress reports to be sent to the logging output.

The debug argument has been repurposed. It now call recover when the error occurs, so that the problem can be debugged.

Note

The function maxCPTParam is an abstract generic function, and it needs specific implementations. See the PNetica-package for an example. A default implementation is provides for maxAllTableParams which loops through calls to maxCPTParam for each node in PnetPnodes(net).

This function assumes that the host Bayes net implementation (e.g., RNetica-package): (1) net has an EM learning function, (2) the EM learning supports hyper-Dirichlet priors, (3) it is possible to recover the hyper-Dirichlet posteriors after running the internal EM algorithm.

Author(s)

Russell Almond

References

Almond, R. G. (2015) An IRT-based Parameterization for Conditional Probability Tables. Paper presented at the 2015 Bayesian Application Workshop at the Uncertainty in Artificial Intelligence Conference.

Examples

## Not run: 

library(PNetica) ## Need a specific implementation
sess <- NeticaSession()
startSession(sess)

irt10.base <- ReadNetworks(system.file(
                           "testnets", "IRT10.2PL.base.dne",
                                package="PNetica"),
                           session=sess)
irt10.base <- as.Pnet(irt10.base)  ## Flag as Pnet, fields already set.
irt10.theta <- NetworkFindNode(irt10.base,"theta")
irt10.items <- PnetPnodes(irt10.base)
## Flag items as Pnodes
for (i in 1:length(irt10.items)) {
  irt10.items[[i]] <- as.Pnode(irt10.items[[i]])
  ## Add node to list of observed nodes
  PnodeLabels(irt10.items[[1]]) <-
     union(PnodeLabels(irt10.items[[1]]),"onodes")
}

casepath <- system.file("testdat", "IRT10.2PL.200.items.cas", package="PNetica")


BuildAllTables(irt10.base)
PnetCompile(irt10.base) ## Netica requirement

item1 <- irt10.items[[1]]
priB <- PnodeBetas(item1)
priA <- PnodeAlphas(item1)
priCPT <- PnodeProbs(item1)

gemout <- GEMfit(irt10.base,casepath,trace=TRUE)

calcExpTables(irt10.base,casepath)

maxAllTableParams(irt10.base)

postB <- PnodeBetas(item1)
postA <- PnodeAlphas(item1)
BuildTable(item1)
postCPT <- PnodeProbs(item1)

## Posterior should be different
stopifnot(
  postB != priB, postA != priA
)


DeleteNetwork(irt10.base)
stopSession(sess)


## End(Not run)

ralmond/Peanut documentation built on Sept. 19, 2023, 8:27 a.m.