GEMfit | R Documentation |
A Pnet
is a description of a parameterized Bayesian
network, with each Pnode
giving the parameterization for
its conditional probability table. This function uses a generalized EM
algorithm to find the values of the parameters for each Pnode
which maximize the posterior probability of the data in cases
.
GEMfit(net, cases, tol = sqrt(.Machine$double.eps),
maxit = 100, Estepit = 1, Mstepit = 30,
trace=FALSE, debugNo=maxit+1)
net |
A |
cases |
An object representing a set of cases. Note the type of object is implementation dependent. It could be either a data frame providing cases or a filename for a case file. |
tol |
A numeric scalar giving the stopping tolerance for the for the EM algorithm. |
maxit |
An integer scalar giving the maximum number of iterations for the outer EM algorithm. |
Estepit |
An integer scalar giving the number of steps the Bayes net package should take in the internal EM algorithm during the E-step. |
Mstepit |
An integer scalar giving the number of steps that
should be taken by |
trace |
A logical value which indicates whether or not cycle by
cycle information should be sent to to the
|
debugNo |
An integer scalar. When this iteration is reached,
then the |
The GEMfit
algorithm uses a generalized EM algorithm to
fit the parameterized network to the given data. This loops over the
following steps:
Run the internal EM algorithm of the Bayes net package
to calculate expected tables for all of the tables being learned.
The function calcExpTables
carries out this step.
Find a set of table parameters which maximize the fit
to the expected counts by calling mapDPC
for each table. The function maxAllTableParams
does
this step.
Set all the conditional probability tables in the
network to the new parameter values. The function
BuildAllTables
does this.
Calculate the log likelihood of the
cases
under the new parameters and stop if no change. The
function calcPnetLLike
calculates the log likelihood.
Note that although GEMfit
is not a generic function, the four
main component functions, calcExpTables
,
maxAllTableParams
, BuildAllTables
, and
calcPnetLLike
, are generic functions. In particular,
the cases
argument is passed to calcExpTables
and
calcPnetLLike
and must be whatever the host Bayes net
package regards as a collection of cases. In
PNetica-package
the cases
argument
should be a filename of a Netica case file (see
write.CaseFile
).
The parameter tol
controls the convergence checking. In
particular, the algorithm stops when the difference in log-likelihood
(as computed by calcPnetLLike
) between iterations is
less than tol
in absolute value. If the number of iterations
exceeds maxit
the algorithm will stop and report lack of
convergence.
The E-step and the M-step are also both iterative; the parameters
Estepit
and Mstepit
control the number of iterations
taken in each step respectively. As the goal of the E-step is to
calculate the expected tables of counts, the default value of 1 should
be fine. Although the algorithm should eventually converge for any
value of Mstepit
, different values may affect the convergence
rate, and analysts may need to experiment with application specific
values of this parameter.
The arguments trace
and debugNo
are used to provide
extra debugging information. Setting trace
to TRUE
means that a message is printed after tables are built but before they
are updated. Setting debugNo
to a certain integer, will begin
node-by-node messages for both BuildAllTables
and
maxAllTableParams
. In particular, setting it to 1 is
useful for debugging problems that occur at initialization. If the
problem turns up at a later cycle, the trace
option can be used
to figure out when the error occurs.
A list with three elements:
converged |
A logical flag indicating whether or not the algorithm reached convergence. |
iter |
An integer scalar giving the number of iterations of the outer EM loop taken by the algorithm (plus 1 for the starting point). |
llikes |
A numeric vector of length |
As a side effect the PnodeLnAlphas
and
PnodeBetas
fields of all nodes in
PnetPnodes(net)
) are updated to better fit the expected
tables, and the internal conditional probability tables are updated to
match the new parameter values.
As of version 0.6-2, the meaning of the trace
and
debugNo
has changed. In the new version, the
flog.logger
mechanism is used for
progress reports, and error reporting.
Setting trace
to true causes information about the steps of the
algoritm (incluing the log likelihood at each step) to be output to
the current appender (see flog.appender
)
The logging is done at the INFO
level. As the default appender
is the console, and INFO
is the default logging level, the
meaning of this parameter hasn't changed much.
The meaning of debugNo
has changed, howver. Previously, it
would turn on extra debug information when the target iteration was
reached. That information is now always logged at the DEBUG
level. So now if the current iteration reached debugNo
, then
GEMfit
calls flog.threshold(DEBUG)
to provide more information. This allows the more detailed
DEBUG
-level messages to be turned on when the EM algorithm is
closer to convergence.
Note that although this is not a generic function, the four main
component functions: calcExpTables
,
maxAllTableParams
, BuildAllTables
, and
calcPnetLLike
. All four must have specific
implementations for this function to work. See the
PNetica-package
for an example.
These functions assume that the host Bayes net implementation (e.g.,
RNetica-package
): (1) net
has an EM
learning function, (2) the EM learning supports hyper-Dirichlet
priors, (3) it is possible to recover the hyper-Dirichlet posteriors
after running the internal EM algorithm.
Russell Almond
Almond, R. G. (2015) An IRT-based Parameterization for Conditional Probability Tables. Paper presented at the 2015 Bayesian Application Workshop at the Uncertainty in Artificial Intelligence Conference.
Pnet
, calcExpTables
, calcPnetLLike
,
maxAllTableParams
, BuildAllTables
## Not run:
library(PNetica) ## Need a specific implementation
sess <- NeticaSession()
startSession(sess)
irt10.base <- ReadNetworks(system.file(
"testnets","IRT10.2PL.base.dne",
package="PNetica"),
session=sess)
irt10.base <- as.Pnet(irt10.base) ## Flag as Pnet, fields already set.
irt10.theta <- PnetFindNode(irt10.base,"theta")
irt10.items <- PnetPnodes(irt10.base)
## Flag items as Pnodes
for (i in 1:length(irt10.items)) {
irt10.items[[i]] <- as.Pnode(irt10.items[[i]])
## Add node to list of observed nodes
PnodeLabels(irt10.items[[1]]) <-
union(PnodeLabels(irt10.items[[1]]),"onodes")
}
casepath <- system.file("testdat", "IRT10.2PL.200.items.cas", package="PNetica")
BuildAllTables(irt10.base)
PnetCompile(irt10.base) ## Netica requirement
item1 <- irt10.items[[1]]
priB <- PnodeBetas(item1)
priA <- PnodeAlphas(item1)
priCPT <- PnodeProbs(item1)
gemout <- GEMfit(irt10.base,casepath,trace=TRUE)
postB <- PnodeBetas(item1)
postA <- PnodeAlphas(item1)
postCPT <- PnodeProbs(item1)
## Posterior should be different
stopifnot(
postB != priB, postA != priA
)
### The network that was used for data generation.
irt10.true <- ReadNetworks(system.file(
"testnets", "IRT10.2PL.true.dne",
package="PNetica"),
session=sess)
irt10.true <- as.Pnet(irt10.true) ## Flag as Pnet, fields already set.
irt10.ttheta <- PnetFindNode(irt10.true,"theta")
irt10.titems <- PnetPnodes(irt10.true)
## Flag titems as Pnodes
for (i in 1:length(irt10.titems)) {
irt10.titems[[i]] <- as.Pnode(irt10.titems[[i]])
## Add node to list of observed nodes
PnodeLabels(irt10.titems[[1]]) <-
union(PnodeLabels(irt10.titems[[1]]),"onodes")
}
BuildAllTables(irt10.true)
PnetCompile(irt10.true) ## Netica requirement
## See how close we are.
for (j in 1:length(irt10.titems)) {
cat("diff[",j,"] = ",
sum(abs(PnodeProbs(irt10.items[[j]])-
PnodeProbs(irt10.titems[[j]])))/
length(PnodeProbs(irt10.items[[j]])), "\n")
}
DeleteNetwork(irt10.base)
DeleteNetwork(irt10.true)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.