View source: R/frontend-queries.R
cpquery | R Documentation |
Perform conditional probability queries (CPQs).
cpquery(fitted, event, evidence, cluster, method = "ls", ...,
debug = FALSE)
cpdist(fitted, nodes, evidence, cluster, method = "ls", ...,
debug = FALSE)
mutilated(x, evidence)
fitted |
an object of class |
x |
an object of class |
event , evidence |
see below. |
nodes |
a vector of character strings, the labels of the nodes whose conditional distribution we are interested in. |
cluster |
an optional cluster object from package parallel. |
method |
a character string, the method used to perform the conditional
probability query. Currently only logic sampling ( |
... |
additional tuning parameters. |
debug |
a boolean value. If |
cpquery
estimates the conditional probability of event
given
evidence
using the method specified in the method
argument.
cpdist
generates random samples conditional on the evidence
using the method specified in the method
argument.
mutilated
constructs the mutilated network arising from an ideal
intervention setting the nodes involved to the values specified by
evidence
. In this case evidence
must be provided as a list
in the same format as for likelihood weighting (see below).
Note that both cpquery
and cpdist
are based on Monte Carlo
particle filters, and therefore they may return slightly different values
on different runs due to simulation noise.
cpquery()
returns a numeric value, the conditional probability of
event()
conditional on evidence
.
cpdist()
returns a data frame containing the samples generated from
the conditional distribution of the nodes
conditional on
evidence()
. The data frame has class c("bn.cpdist", "data.frame")
,
and a meth, -8od
attribute storing the value of the method
argument. In the case of likelihood weighting, the weights are also attached
as an attribute called weights
.
mutilated
returns a bn
or bn.fit
object, depending on the
class of x
.
Logic sampling is an approximate inference algorithm.
The event
and evidence
arguments must be two expressions
describing the event of interest and the conditioning evidence in a format
such that, if we denote with data
the data set the network was learned
from, data[evidence, ]
and data[event, ]
return the correct
observations. If either event
or evidence
is set to TRUE
an unconditional probability query is performed with respect to that argument.
Three tuning parameters are available:
n
: a positive integer number, the number of random samples
to generate from fitted
. The default value is
5000 * log10(nparams(fitted))
for discrete and conditional Gaussian
networks and 500 * nparams(fitted)
for Gaussian networks.
batch
: a positive integer number, the number of random samples
that are generated at one time. Defaults to 10^4
. If the n
is very large (e.g. 10^12
), R would run out of memory if it tried
to generate them all at once. Instead random samples are generated in
batches of size batch
, discarding each batch before generating the
next.
query.nodes
: a vector of character strings, the labels of
the nodes involved in event
and evidence
. Simple queries do
not require to generate samples from all the nodes in the network,
so cpquery
and cpdist
try to identify which nodes are used
in event
and evidence
and reduce the network to their upper
closure. query.nodes
may be used to manually specify these nodes
when automatic identification fails; there is no reason to use it
otherwise.
Note that the number of samples returned by cpdist()
is always smaller
than n
, because logic sampling is a form of rejection sampling.
Therefore, only the observations matching evidence
(out of the n
that are generated) are returned, and their number depends on the probability
of evidence
. Furthermore, the probabilities returned by
cpquery()
are approximate estimates and they will not sum up to 1 even
when the corresponding underlying values do if they are computed in separate
calls to cpquery().
Likelihood weighting is an approximate inference algorithm based on Monte Carlo sampling.
The event
argument must be an expression describing the event of
interest, as in logic sampling. The evidence
argument must be a named
list:
Each element corresponds to one node in the network and must contain the value that node will be set to when sampling.
In the case of a continuous node, two values can also be provided. In that case, the value for that node will be sampled from a uniform distribution on the interval delimited by the specified values.
In the case of a discrete or ordinal node, two or more values can also be provided. In that case, the value for that node will be sampled with uniform probability from the set of specified values.
If either event
or evidence
is set to TRUE
an
unconditional probability query is performed with respect to that argument.
Tuning parameters are the same as for logic sampling: n
, batch
and query.nodes
.
Note that the samples returned by cpdist()
are generated from the
mutilated network, and need to be weighted appropriately when computing
summary statistics (for more details, see the references below).
cpquery
does that automatically when computing the final conditional
probability. Also note that the batch
argument is ignored in
cpdist()
for speed and memory efficiency. Furthermore, the
probabilities returned by cpquery()
are approximate estimates and they
will not sum up to 1 even when the corresponding underlying values do if they
are computed in separate calls to cpquery().
Marco Scutari
Koller D, Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.
Korb K, Nicholson AE (2010). Bayesian Artificial Intelligence. Chapman & Hall/CRC, 2nd edition.
## discrete Bayesian network (it is the same with ordinal nodes).
data(learning.test)
fitted = bn.fit(hc(learning.test), learning.test)
# the result should be around 0.025.
cpquery(fitted, (B == "b"), (A == "a"))
# programmatically build a conditional probability query...
var = names(learning.test)
obs = 2
str = paste("(", names(learning.test)[-3], " == '",
sapply(learning.test[obs, -3], as.character), "')",
sep = "", collapse = " & ")
str
str2 = paste("(", names(learning.test)[3], " == '",
as.character(learning.test[obs, 3]), "')", sep = "")
str2
cmd = paste("cpquery(fitted, ", str2, ", ", str, ")", sep = "")
eval(parse(text = cmd))
# ... but note that predict works better in this particular case.
attr(predict(fitted, "C", learning.test[obs, -3], prob = TRUE), "prob")
# do the same with likelihood weighting.
cpquery(fitted, event = eval(parse(text = str2)),
evidence = as.list(learning.test[2, -3]), method = "lw")
attr(predict(fitted, "C", learning.test[obs, -3],
method = "bayes-lw", prob = TRUE), "prob")
# conditional distribution of A given C == "c".
table(cpdist(fitted, "A", (C == "c")))
## Gaussian Bayesian network.
data(gaussian.test)
fitted = bn.fit(hc(gaussian.test), gaussian.test)
# the result should be around 0.04.
cpquery(fitted,
event = ((A >= 0) & (A <= 1)) & ((B >= 0) & (B <= 3)),
evidence = (C + D < 10))
## ideal interventions and mutilated networks.
mutilated(fitted, evidence = list(F = 42))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.