Pnet2Qmat | R Documentation |
In augmented Q
-matrix, there is a set of rows for each
Pnode
which describes the conditional probability table
for that node in terms of the model parameters (see
BuildTable
). As the Pnodes could potentially come from
multiple nets, the key for the table is (“Model”,
“Node”). As there are multiple rows per node, “State”
is the third part of the key.
The function Pnet2
creates an augmented Q
-matrix
out of a collection of Pnode
s, possibly spanning multiple
Pnet
s.
Pnet2Qmat(obs, prof, defaultRule = "Compensatory", defaultLink = "partialCredit",
defaultAlpha = 1, defaultBeta = NULL, defaultLinkScale = NULL, debug = TRUE)
obs |
A list of observable |
prof |
A list of proficiency |
defaultRule |
This should be a character scalar giving the name
of a CPTtools combination rule (see
|
defaultLink |
This should be a character scalar giving the name
of a CPTtools link function (see |
defaultAlpha |
A numeric scalar giving the default value for slope parameters. |
defaultBeta |
A numeric scalar giving the default value for difficulty (negative intercept) parameters. |
defaultLinkScale |
A positive number which gives the default value for the link scale parameter. |
debug |
A logical value. If true, extra information will be printed during process of building the Pnet. |
A Q
-matrix is a 0-1 matrix which describes which proficiency
(latent) variables are connected to which observable outcome
variables; q_{jk}=1
if and only if
proficiency variable k
is a parent of observable variable
j
. Almond (2010) suggested that augmenting the Q
-matrix
with additional columns representing the combination rules
(PnodeRules
), link function (PnodeLink
),
link scale parameter (if needed, PnodeLinkScale
) and
difficulty parameters (PnodeBetas
). The discrimination
parameters (PnodeAlphas
) could be overloaded with the
Q
-matrix, with non-zero parameters in places where there were
1's in the Q
-matrix.
This arrangement worked fine with combination rules (e.g.,
Compensatory
) which contained multiple alpha
(discrimination) parameters, one for each parent variable, and a
single beta (difficulty). The introduction of a new type of offset
rule (e.g., OffsetDisjunctive
) which uses a multiple
difficulty parameters, one for each parent variable, and a single
alpha. Almond (2016) suggested a new augmentation which has three
matrixes in a single table (a Qmat): the Q
-matrix, which
contains structural information; the A
-matrix, which contains
discrimination parameters; and the B
-matrix, which contains the
difficulty parameters. The names for the columns for these matrixes
contain the names of the proficiency variables, prepended with
“A.” or “B.” in the case of the A
-matrix and
B
-matrix. There are two additional columns marked “A”
and “B” which are used for the discrimination and difficulty
parameter in the multiple-beta and multiple-alpha cases. There is
some redundancy between the Q
, A
and B
matrixes, but
this provides an opportunity for checking the validity of the input.
The introduction of the partial credit link function
(partialCredit
) added a further
complication. With the partial credit model, there could be a
separate set of discrimination or difficulty parameters for each
transition for a polytomous item. Even the
gradedResponse
link function requires a
separate difficulty parameter for each level of the varaible save the
first. The rows of the Qmat data structure are hence augmented to
include one row for every state but the lowest-level state. There
should be of fewer rows of associated with the node than the value in
the “Nstates” column, and the names of the states (values in
the “State” column) should correspond to every state of the
target variable except the first. It is an error if the number of
states does not match the existing node, or if the state names do not
match what is already used for the node or is in the manifest for the
node Warehouse
.
Note that two nodes in different networks may share the same name, and
two states in two different nodes may have the same name as well.
Thus, the formal key for the Qmat data frame is (“Model”,
“Node”, “State”), however, the rows which share the
values for (“Model”, “Node”) form a subtable for that
particular node. In particular, the rows of the Q
-matrix
subtable for that node form the inner Q-matrix for that node.
The inner Q
-matrix shows which variables are relevant for each
state transition in a partial credit model. The column-wise maximum
of the inner Q
-matrix forms the row of the outer Q
-matrix
for that node. This shows which proficiency nodes are the parent of
the observable node. This corresponds to
PnodeQ(node)
.
The function Qmat2Pnet
creates and sets the parameters of the
observable Pnode
s referenced in the Qmat
argument. As it needs to reference, and possibly create, a number of
Pnet
s and Pnode
s, it requires both a network and
a node Warehouse
. If the override
parameter is
true, the networks will be modified so that each node has the correct
parents, otherwise Qmat2Pnet
will signal an error if the
existing network structure is inconsistent with the Q
-matrix.
As there is only one link function for each node, the values of
PnodeLink(node)
and
PnodeLinkScale(node)
are set based on the values in the “Link” and
“LinkScale” columns and the first row corresponding to
node. Note that the choice of link functions determines what is
sensible for the other values but this is not checked by the code.
The value of PnodeRules(node)
can either be a single
value or a list of rule names. The first value in the sub-Qmat must a
character value, but if the other values are missing then a single
value is used. If not, all of the entries should be non-missing. If
this is a single value, then effectively the same combination rule is
used for each transition.
The interpretation of the A
-matrix and the B
-matrix
depends on the value in the “Rules” column. There are two
types of rules, multiple-A rules and multiple-B rules (offset rules).
The CPTtools funciton isOffsetRule
checks to
see what kind of a rule it is. The multiple-A rules, of which
Compensatory
is the canonical example, have one
discrimination (or slope) parameter for every parent variable (values
of 1 in the Q
-matrix) and have a single difficulty (negative
intercept) parameter which is in the “B” column of the Qmat.
The multiple-B or offset rules, of which
OffsetConjunctive
is the canonical example,
have a difficulty (negative intercept) parameter for each parent
variable and a single discrimination (slope) parameter which is in the
“A” column. The function Qmat2Pnet
uses the value of
isOffsetRule
to determine whether to use the multiple-B (true)
or multiple-A (false) paradigm.
A simple example is a binary observable variable which uses the
Compensatory
rule. This is essentially a
regression model (logistic regression with
partialCredit
or
gradedResponse
link funcitons, linear
regression with normalLink
link function) on
the parent variables. The linear predictor is:
\frac{1}{\sqrt{K}} (a_1\theta_1 + \ldots + a_K\theta_K) - b .
The values \theta_1, \ldots, \theta_K
are effective thetas, real
values corresponding to the states of the parent variables. The
value a_i
is stored in the column “A.namei” where
namei is the name of the i
th proficiency variable; the
value of PnodeAlphas(node)
is the vector a_1,
\ldots, a_k
with names corresponding to the parent variables. The
value of b
is stored in the “B” column; the value of
PnodeBetas(node)
is b
.
The multiple-B pattern replaces the A
-matrix with the
B
-matrix and the column “A” with “B”.
Consider binary observable variable which uses the
OffsetConjunctive
rule. The linear predictor is:
a \min (\theta_1 -b+1, \ldots , \theta_K- b_K) .
The value b_i
is stored in the column “B.namei” where
namei is the name of the i
th proficiency variable; the
value of PnodeBetas(node)
is the vector b_1,
\ldots, b_k
with names corresponding to the parent variables. The
value of a
is stored in the “A” column; the value of
PnodeBetas(node)
is a
.
When there are more than two states in the output varible,
PnodeRules
, PnodeAlphas(node)
and
PnodeBetas(node)
become lists to indicate that a
different value should be used for each transition between states.
If there is a single value in the “Rules” column, or
equivalently the value of PnodeRules
is a scalar, then
the same rule is repeated for each state transition. The same is true
for PnodeAlphas(node)
and
PnodeBetas(node)
. If these values are a list,
that indicates that a different value is to be used for each
transition. If they are a vector that means that different values (of
discriminations for multiple-a rules or difficulties for multiple-b
rules) are needed for the parent variables, but the same set of values
is to be used for each state transition. If different values are to
be used then the values are a list of vectors.
The necessary configuration of a
's and b
's depends on the
type of link function. Here are the rules for the currently existing
link funcitons:
(normalLink
) This link function
uses the same linear predictor for each transition, so there should be
a single rule, and PnodeAlphas(node)
and
PnodeBetas(node)
should both be vectors (with
b
of length 1 for a multiple-a rule). This rule also requires a
positive value for the PnodeLinkScale(node)
in the
“"LinkScale"” column. The values in the “A.name”
and “B.name” for rows after the first can be left as
NA
's to indicate that the same values are reused.
(gradedResponse
) This
link function models the probability of getting at or above each
state and then calculates the differences between them to produce
the conditional probability table. In order to avoid negative
probabilities, the probability of being in a higher state must
always be nonincreasing. The surest way to ensure this is to both
use the same combination rules at each state and the same set of
discrimination parameters for each state. The difficulty parameters
must be nondecreasing. Again, values for rows after the first can
be left as NA
s to indicate that the same value should be
resused.
(partialCredit
) This
link function models the conditional probability from moving from
the previous state to the current state. As such, there is no
restriction on the rules or parameters. In particular, it can
alternate between multiple-a and multiple-b style rules from row to
row.
Another restriction that the use of the partial credit rule lifts is
the restriction that all parent variable must be used in each
transition. Note that there is one row of the Q
-matrix (the
inner Q
-matrix) for each state transition. Only the parent
variables with 1's in the particular state row are considered when
building the PnodeAlphas(node)
and
PnodeBetas(node)
for this model. Note that only
the partial credit link function can take advantage of the multiple
parents, the other two require all parents to be used for every
state.
The function Pnet2Qmat
takes a collection of nodes (in a series
of spoke or evidence models) and builds a Qmat data structure that can
reproduce them. It loops through the nodes and fills out the Qmat
based on the properties of the Pnode
s. Note that if the
proprties are not yet set, then the default values are used, thus
applying this to a network for which the structure has been
established, but the parameters have not yet been set will build a
blank Qmat which can be adjusted by experts.
The output augmented Q
-matrix is a data frame with the columns
described below. The number of columns is variable, with items marked
prof actually corresponding to a number of columns with names
taken from the proficiency variables (the prof
argument).
Model |
The name of the |
Node |
The name of the |
Nstates |
The number of states for this node. Generally, each node should have one fewer rows than this number. |
State |
The name of the state for this row. This should be unique within the (“Model”,“Node”) combination. |
Link |
The name of a link function. This corresponds to
|
LinkScale |
Either a positive number giving the link scale
parameter or an |
prof |
There is one column for each proficiency variable.
This corresponds to the structural part of the |
Rules |
The name of the combination rule to use for this row.
This corresponds to |
A.prof |
There is one column for each proficiency with the
proficiency name appended to “A.”. If a multiple-alpha style
combination rule (e.g., |
A |
If a multiple-beta style
combination rule (e.g., |
B.prof |
There is one column for each proficiency with the
proficiency name appended to “B.”. If a multiple-bet style
combination rule (e.g., |
B |
If a multiple-beta style
combination rule (e.g., |
PriorWeight |
The amount of weight which should be given to the
current values when learning conditional probability tables. See
|
Russell Almond
Almond, R. G. (2010). ‘I can name that Bayesian network in two matrixes.’ International Journal of Approximate Reasoning. 51, 167-178.
Almond, R. G. (presented 2017, August). Tabular views of Bayesian networks. In John-Mark Agosta and Tomas Singlair (Chair), Bayeisan Modeling Application Workshop 2017. Symposium conducted at the meeting of Association for Uncertainty in Artificial Intelligence, Sydney, Australia. (International) Retrieved from http://bmaw2017.azurewebsites.net/
The inverse operation is Qmat2Pnet
.
See Warehouse
for description of the network and node
warehouse arguments
See partialCredit
,
gradedResponse
, and
normalLink
for currently available link
functions. See Conjunctive
and
OffsetConjunctive
for more information about
available combination rules.
The node attributes set from the Omega matrix include:
PnodeParents(node)
,
PnodeLink(node)
,
PnodeLinkScale(node)
,
PnodeRules(node)
,
PnodeQ(node)
,
PnodeAlphas(node)
,
PnodeBetas(node)
, and
PnodePriorWeight(node)
## Sample Q matrix
Q1 <- read.csv(system.file("auxdata", "miniPP-Q.csv", package="Peanut"),
stringsAsFactors=FALSE)
## Not run:
library(PNetica) ## Needs PNetica
sess <- NeticaSession()
startSession(sess)
curd <- getwd()
netman1 <- read.csv(system.file("auxdata", "Mini-PP-Nets.csv",
package="Peanut"),
row.names=1,stringsAsFactors=FALSE)
nodeman1 <- read.csv(system.file("auxdata", "Mini-PP-Nodes.csv",
package="Peanut"),
row.names=1,stringsAsFactors=FALSE)
omegamat <- read.csv(system.file("auxdata", "miniPP-omega.csv",
package="Peanut"),
row.names=1,stringsAsFactors=FALSE)
## Insures we are building nets from scratch
setwd(tempdir())
## Network and node warehouse, to create networks and nodes on demand.
Nethouse <- BNWarehouse(manifest=netman1,session=sess,key="Name")
Nodehouse <- NNWarehouse(manifest=nodeman1,
key=c("Model","NodeName"),
session=sess)
## Build the proficiency model first:
CM <- WarehouseSupply(Nethouse,"miniPP_CM")
CM1 <- Omega2Pnet(omegamat,CM,Nodehouse,override=TRUE)
## Build the nets from the Qmat
Qmat2Pnet(Q1, Nethouse,Nodehouse)
## Build the Qmat from the nets
## Generate a list of nodes
obs <-unlist(sapply(list(sess$nets$PPcompEM,sess$nets$PPconjEM,
sess$nets$PPtwostepEM,sess$nets$PPdurAttEM),
NetworkAllNodes))
Q2 <- Pnet2Qmat(obs,NetworkAllNodes(CM))
## adjust Q1 to match Q2
Q1 <- Q1[,-1] ## Drop unused first column.
class(Q1) <- c("Qmat", "data.frame")
# Force them into the same order
Q1 <- Q1[order(Q1$Model,Q1$Node),]
Q2 <- Q2[order(Q2$Model,Q2$Node),]
row.names(Q1) <- NULL
row.names(Q2) <- NULL
## Force all NA columns into the right type
Q1$LinkScale <- as.numeric(Q1$LinkScale)
Q1$A.Physics <- as.numeric(Q1$A.Physics)
Q1$A.IterativeD <- as.numeric(Q1$A.IterativeD)
Q1$B.Physics <- as.numeric(Q1$B.Physics)
Q1$B.NTL <- as.numeric(Q1$B.NTL)
## Fix fancy quotes added by some spreadsheets
Q1$Rules <- gsub(intToUtf8(c(91,0x201C,0x201D,93)),"\"",Q1$Rules)
## Insert Default Prior Weights
Q1$PriorWeight <- ifelse(is.na(Q1$NStates),"","10")
all.equal(Q1,Q2)
stopSession(sess)
setwd(curd)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.