score | R Documentation |
Compute the score of the Bayesian network.
## S4 method for signature 'bn'
score(x, data, type = NULL, ..., by.node = FALSE, debug = FALSE)
## S4 method for signature 'bn.naive'
score(x, data, type = NULL, ..., by.node = FALSE, debug = FALSE)
## S4 method for signature 'bn.tan'
score(x, data, type = NULL, ..., by.node = FALSE, debug = FALSE)
## S3 method for class 'bn'
logLik(object, data, ...)
## S3 method for class 'bn'
AIC(object, data, ..., k = 1)
## S3 method for class 'bn'
BIC(object, data, ...)
x , object |
an object of class |
data |
a data frame containing the data the Bayesian network that will be used to compute the score. |
type |
a character string, the label of a network score. If none is
specified, the default score is the Bayesian Information Criterion
for both discrete and continuous data sets. See |
by.node |
a boolean value. If |
debug |
a boolean value. If |
... |
extra arguments from the generic method (for the |
k |
a numeric value, the penalty coefficient to be used; the default
|
Additional arguments of the score()
function:
iss
: the imaginary sample size used by the Bayesian Dirichlet
scores (bde
, mbde
, bds
, bdj
). It is also known
as “equivalent sample size”. The default value is equal to 1
.
iss.mu
: the imaginary sample size for the normal component of
the normal-Wishart prior in the Bayesian Gaussian score (bge
). The
default value is 1
.
iss.w
: the imaginary sample size for the Wishart component of
the normal-Wishart prior in the Bayesian Gaussian score (bge
). The
default value is ncol(data) + 2
.
nu
: the mean vector of the normal component of the
normal-Wishart prior in the Bayesian Gaussian score (bge
). The
default value is equal to colMeans(data)
.
l
: the number of scores to average in the locally averaged
Bayesian Dirichlet score (bdla
). The default value is 5
.
exp
: a list of indexes of experimental observations (those that
have been artificially manipulated). Each element of the list must be
named after one of the nodes, and must contain a numeric vector with
indexes of the observations whose value has been manipulated for that node.
k
: the penalty coefficient to be used by the AIC, BIC and
penalized node-average log-likelihood scores. The default value is
1
for AIC, log(nrow(data)) / 2
for BIC and
1 / nnnodes(x) * nrow(data) ^ -0.25
for the node-average
log-likelihood scores.
gamma
: the additional penalty in the extended BIC scores. The
default value is 0.5
.
prior
: the prior distribution to be used with the various
Bayesian Dirichlet scores (bde
, mbde
, bds
,
bdj
, bdla
) and the Bayesian Gaussian score (bge
).
Possible values are:
uniform
(the default).
vsp
: the Bayesian variable selection prior, which puts a
probability of inclusion on parents.
marginal
: an independent marginal uniform for each arc.
cs
: the Castelo & Siebes prior, which puts an independent
prior probability on each arc and direction).
beta
: the parameter associated with prior
.
If prior
is uniform
, beta
is ignored.
If prior
is vsp
, beta
is the probability of
inclusion of an additional parent. The default is 1/ncol(data)
.
If prior
is marginal
, beta
is the probability
of inclusion of an arc. Each direction has a probability of inclusion
of beta / 2
and the probability that the arc is not included is
therefore 1 - beta
. The default value is 0.5
, so that
arc inclusion and arc exclusion have the same probability.
If prior
is cs
, beta
is a data frame with
columns from
, to
and prob
specifying the prior
probability for a set of arcs. A uniform probability distribution is
assumed for the remaining arcs.
newdata
: the test set whose predictive likelihood will be
computed by pred-loglik
, pred-loglik-g
or
pred-loglik-cg
. It should be a data frame with the same variables
as data
.
fun
: the function that computes the score component for a
single node in the custom
score. fun
must have arguments
node
, parents
, data
and args
, in this order;
in other words, it must have signature function(node, parents, data,
args)
. node
will contain the label of the node to be scored (a
character string); parents
will contain the labels of its parents
(a character vector); data
will contain the complete data set,
with all the variables (a data frame); and args
will be a list
containing any additional arguments to the score.
args
: a list containing the optional arguments to fun
,
for tuning custom
score functions.
For score()
with by.node = TRUE
, a vector of numeric values, the
individual node contributions to the score of the Bayesian network.
Otherwise, a single numeric value, the score of the Bayesian network.
AIC and BIC are computed as logLik(x) - k * nparams(x)
, that is, the
classic definition rescaled by -2. Therefore higher values are better, and
for large sample sizes BIC converges to log(BDe).
When using the Castelo & Siebes prior in structure learning, the prior
probabilities associated with an arc are bound away from zero and one by
shrinking them towards the uniform distribution as per Hausser and Strimmer
(2009) with a lambda equal to 3 * sqrt(.Machine$double.eps)
. This
dramatically improves structure learning, which is less likely to get stuck
when starting from an empty graph. As an alternative to prior probabilities,
a blacklist can be used to prevent arcs from being included in the network,
and a whitelist can be used to force the inclusion of particular arcs.
beta
is not modified when the prior is used from functions other than
those implementing score-based and hybrid structure learning.
Marco Scutari
network scores
, arc.strength
,
alpha.star
.
data(learning.test)
dag = hc(learning.test)
score(dag, learning.test, type = "bde")
## let's see score equivalence in action!
dag2 = set.arc(dag, "B", "A")
score(dag2, learning.test, type = "bde")
## K2 score on the other hand is not score equivalent.
score(dag, learning.test, type = "k2")
score(dag2, learning.test, type = "k2")
## BDe with a prior.
beta = data.frame(from = c("A", "D"), to = c("B", "F"),
prob = c(0.2, 0.5), stringsAsFactors = FALSE)
score(dag, learning.test, type = "bde", prior = "cs", beta = beta)
## equivalent to logLik(dag, learning.test)
score(dag, learning.test, type = "loglik")
## equivalent to AIC(dag, learning.test)
score(dag, learning.test, type = "aic")
## custom score, computing BIC manually.
my.bic = function(node, parents, data, args) {
n = nrow(data)
if (length(parents) == 0) {
counts = table(data[, node])
nparams = dim(counts) - 1
sum(counts * log(counts / n)) - nparams * log(n) / 2
}#THEN
else {
counts = table(data[, node], configs(data[, parents, drop = FALSE]))
nparams = ncol(counts) * (nrow(counts) - 1)
sum(counts * log(prop.table(counts, 2))) - nparams * log(n) / 2
}#ELSE
}#MY.BIC
score(dag, learning.test, type = "custom-score", fun = my.bic, by.node = TRUE)
score(dag, learning.test, type = "bic", by.node = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.