Pnet2Qmat: Makes an augmented Q-matrix from a collection of...
In ralmond/Peanut: Parameterized Bayesian Networks, Abstract Classes

Pnet2Qmat

R Documentation

Makes an augmented Q-matrix from a collection of parameterized nets

Description

In augmented Q-matrix, there is a set of rows for each Pnode which describes the conditional probability table for that node in terms of the model parameters (see BuildTable). As the Pnodes could potentially come from multiple nets, the key for the table is (“Model”, “Node”). As there are multiple rows per node, “State” is the third part of the key.

The function Pnet2 creates an augmented Q-matrix out of a collection of Pnodes, possibly spanning multiple Pnets.

Usage

Pnet2Qmat(obs, prof, defaultRule = "Compensatory", defaultLink = "partialCredit", 
          defaultAlpha = 1, defaultBeta = NULL, defaultLinkScale = NULL, debug = TRUE)

Arguments

`obs`	A list of observable `Pnode` objects. These could span multiple `Pnet` objects. Each element of this list will corresponded to one or more rows in the output `Q`-matrix.
`prof`	A list of proficiency `Pnode`s. These are the parents of the `Pnode`s in the `obs` list. Usually, these are all in a central proficiency or hub model.
`defaultRule`	This should be a character scalar giving the name of a CPTtools combination rule (see `Compensatory`).
`defaultLink`	This should be a character scalar giving the name of a CPTtools link function (see `partialCredit`).
`defaultAlpha`	A numeric scalar giving the default value for slope parameters.
`defaultBeta`	A numeric scalar giving the default value for difficulty (negative intercept) parameters.
`defaultLinkScale`	A positive number which gives the default value for the link scale parameter.
`debug`	A logical value. If true, extra information will be printed during process of building the Pnet.

Details

A Q-matrix is a 0-1 matrix which describes which proficiency (latent) variables are connected to which observable outcome variables; q_{jk}=1 if and only if proficiency variable k is a parent of observable variable j. Almond (2010) suggested that augmenting the Q-matrix with additional columns representing the combination rules (PnodeRules), link function (PnodeLink), link scale parameter (if needed, PnodeLinkScale) and difficulty parameters (PnodeBetas). The discrimination parameters (PnodeAlphas) could be overloaded with the Q-matrix, with non-zero parameters in places where there were 1's in the Q-matrix.

This arrangement worked fine with combination rules (e.g., Compensatory) which contained multiple alpha (discrimination) parameters, one for each parent variable, and a single beta (difficulty). The introduction of a new type of offset rule (e.g., OffsetDisjunctive) which uses a multiple difficulty parameters, one for each parent variable, and a single alpha. Almond (2016) suggested a new augmentation which has three matrixes in a single table (a Qmat): the Q-matrix, which contains structural information; the A-matrix, which contains discrimination parameters; and the B-matrix, which contains the difficulty parameters. The names for the columns for these matrixes contain the names of the proficiency variables, prepended with “A.” or “B.” in the case of the A-matrix and B-matrix. There are two additional columns marked “A” and “B” which are used for the discrimination and difficulty parameter in the multiple-beta and multiple-alpha cases. There is some redundancy between the Q, A and B matrixes, but this provides an opportunity for checking the validity of the input.

The introduction of the partial credit link function (partialCredit) added a further complication. With the partial credit model, there could be a separate set of discrimination or difficulty parameters for each transition for a polytomous item. Even the gradedResponse link function requires a separate difficulty parameter for each level of the varaible save the first. The rows of the Qmat data structure are hence augmented to include one row for every state but the lowest-level state. There should be of fewer rows of associated with the node than the value in the “Nstates” column, and the names of the states (values in the “State” column) should correspond to every state of the target variable except the first. It is an error if the number of states does not match the existing node, or if the state names do not match what is already used for the node or is in the manifest for the node Warehouse.

Note that two nodes in different networks may share the same name, and two states in two different nodes may have the same name as well. Thus, the formal key for the Qmat data frame is (“Model”, “Node”, “State”), however, the rows which share the values for (“Model”, “Node”) form a subtable for that particular node. In particular, the rows of the Q-matrix subtable for that node form the inner Q-matrix for that node. The inner Q-matrix shows which variables are relevant for each state transition in a partial credit model. The column-wise maximum of the inner Q-matrix forms the row of the outer Q-matrix for that node. This shows which proficiency nodes are the parent of the observable node. This corresponds to PnodeQ(node).

The function Qmat2Pnet creates and sets the parameters of the observable Pnodes referenced in the Qmat argument. As it needs to reference, and possibly create, a number of Pnets and Pnodes, it requires both a network and a node Warehouse. If the override parameter is true, the networks will be modified so that each node has the correct parents, otherwise Qmat2Pnet will signal an error if the existing network structure is inconsistent with the Q-matrix.

As there is only one link function for each node, the values of PnodeLink(node) and PnodeLinkScale(node) are set based on the values in the “Link” and “LinkScale” columns and the first row corresponding to node. Note that the choice of link functions determines what is sensible for the other values but this is not checked by the code.

The value of PnodeRules(node) can either be a single value or a list of rule names. The first value in the sub-Qmat must a character value, but if the other values are missing then a single value is used. If not, all of the entries should be non-missing. If this is a single value, then effectively the same combination rule is used for each transition.

The interpretation of the A-matrix and the B-matrix depends on the value in the “Rules” column. There are two types of rules, multiple-A rules and multiple-B rules (offset rules). The CPTtools funciton isOffsetRule checks to see what kind of a rule it is. The multiple-A rules, of which Compensatory is the canonical example, have one discrimination (or slope) parameter for every parent variable (values of 1 in the Q-matrix) and have a single difficulty (negative intercept) parameter which is in the “B” column of the Qmat. The multiple-B or offset rules, of which OffsetConjunctive is the canonical example, have a difficulty (negative intercept) parameter for each parent variable and a single discrimination (slope) parameter which is in the “A” column. The function Qmat2Pnet uses the value of isOffsetRule to determine whether to use the multiple-B (true) or multiple-A (false) paradigm.

A simple example is a binary observable variable which uses the Compensatory rule. This is essentially a regression model (logistic regression with partialCredit or gradedResponse link funcitons, linear regression with normalLink link function) on the parent variables. The linear predictor is:

\frac{1}{\sqrt{K}} (a_1\theta_1 + \ldots + a_K\theta_K) - b .

The values \theta_1, \ldots, \theta_K are effective thetas, real values corresponding to the states of the parent variables. The value a_i is stored in the column “A.namei” where namei is the name of the ith proficiency variable; the value of PnodeAlphas(node) is the vector a_1, \ldots, a_k with names corresponding to the parent variables. The value of b is stored in the “B” column; the value of PnodeBetas(node) is b.

The multiple-B pattern replaces the A-matrix with the B-matrix and the column “A” with “B”. Consider binary observable variable which uses the OffsetConjunctive rule. The linear predictor is:

a \min (\theta_1 -b+1, \ldots , \theta_K- b_K) .

The value b_i is stored in the column “B.namei” where namei is the name of the ith proficiency variable; the value of PnodeBetas(node) is the vector b_1, \ldots, b_k with names corresponding to the parent variables. The value of a is stored in the “A” column; the value of PnodeBetas(node) is a.

When there are more than two states in the output varible, PnodeRules, PnodeAlphas(node) and PnodeBetas(node) become lists to indicate that a different value should be used for each transition between states. If there is a single value in the “Rules” column, or equivalently the value of PnodeRules is a scalar, then the same rule is repeated for each state transition. The same is true for PnodeAlphas(node) and PnodeBetas(node). If these values are a list, that indicates that a different value is to be used for each transition. If they are a vector that means that different values (of discriminations for multiple-a rules or difficulties for multiple-b rules) are needed for the parent variables, but the same set of values is to be used for each state transition. If different values are to be used then the values are a list of vectors.

The necessary configuration of a's and b's depends on the type of link function. Here are the rules for the currently existing link funcitons:

normal

(normalLink) This link function uses the same linear predictor for each transition, so there should be a single rule, and PnodeAlphas(node) and PnodeBetas(node) should both be vectors (with b of length 1 for a multiple-a rule). This rule also requires a positive value for the PnodeLinkScale(node) in the “"LinkScale"” column. The values in the “A.name” and “B.name” for rows after the first can be left as NA's to indicate that the same values are reused.

graded response

(gradedResponse) This link function models the probability of getting at or above each state and then calculates the differences between them to produce the conditional probability table. In order to avoid negative probabilities, the probability of being in a higher state must always be nonincreasing. The surest way to ensure this is to both use the same combination rules at each state and the same set of discrimination parameters for each state. The difficulty parameters must be nondecreasing. Again, values for rows after the first can be left as NAs to indicate that the same value should be resused.

partial credit

(partialCredit) This link function models the conditional probability from moving from the previous state to the current state. As such, there is no restriction on the rules or parameters. In particular, it can alternate between multiple-a and multiple-b style rules from row to row.

Another restriction that the use of the partial credit rule lifts is the restriction that all parent variable must be used in each transition. Note that there is one row of the Q-matrix (the inner Q-matrix) for each state transition. Only the parent variables with 1's in the particular state row are considered when building the PnodeAlphas(node) and PnodeBetas(node) for this model. Note that only the partial credit link function can take advantage of the multiple parents, the other two require all parents to be used for every state.

The function Pnet2Qmat takes a collection of nodes (in a series of spoke or evidence models) and builds a Qmat data structure that can reproduce them. It loops through the nodes and fills out the Qmat based on the properties of the Pnodes. Note that if the proprties are not yet set, then the default values are used, thus applying this to a network for which the structure has been established, but the parameters have not yet been set will build a blank Qmat which can be adjusted by experts.

Value

The output augmented Q-matrix is a data frame with the columns described below. The number of columns is variable, with items marked prof actually corresponding to a number of columns with names taken from the proficiency variables (the prof argument).

`Model`	The name of the `Pnet` in which the node in this row lives.
`Node`	The name of the `Pnode` described in this row. Except for the multiple rows corresponding to the same node, the value of this column needs to be unique within “Model”.
`Nstates`	The number of states for this node. Generally, each node should have one fewer rows than this number.
`State`	The name of the state for this row. This should be unique within the (“Model”,“Node”) combination.
`Link`	The name of a link function. This corresponds to `PnodeLink(node)`.
`LinkScale`	Either a positive number giving the link scale parameter or an `NA` if the link function does not need scale parameters. This corresponds to `PnodeLinkScale(node)`.
`prof`	There is one column for each proficiency variable. This corresponds to the structural part of the `Q`-matrix. There should be 1 in this column if the named proficiency is used in calculating the transition to this state for this particular node, and a 0 otherwise.
`Rules`	The name of the combination rule to use for this row. This corresponds to `PnodeRules(node)`.
`A.prof`	There is one column for each proficiency with the proficiency name appended to “A.”. If a multiple-alpha style combination rule (e.g., `Compensatory`) this column should contain the appropriate discriminations, otherwise, its value should be `NA`.
`A`	If a multiple-beta style combination rule (e.g., `OffsetConjunctive`) this column should contain the single discrimination, otherwise, its value should be `NA`.
`B.prof`	There is one column for each proficiency with the proficiency name appended to “B.”. If a multiple-bet style combination rule (e.g., `OffsetConjunctive`) this column should contain the appropriate difficulty (negative intercept), otherwise, its value should be `NA`.
`B`	If a multiple-beta style combination rule (e.g., `Compensatory`) this column should contain the single difficulty (negative intercept), otherwise, its value should be `NA`.
`PriorWeight`	The amount of weight which should be given to the current values when learning conditional probability tables. See `PnodePriorWeight`.

Author(s)

Russell Almond

References

Almond, R. G. (2010). ‘I can name that Bayesian network in two matrixes.’ International Journal of Approximate Reasoning. 51, 167-178.

Almond, R. G. (presented 2017, August). Tabular views of Bayesian networks. In John-Mark Agosta and Tomas Singlair (Chair), Bayeisan Modeling Application Workshop 2017. Symposium conducted at the meeting of Association for Uncertainty in Artificial Intelligence, Sydney, Australia. (International) Retrieved from http://bmaw2017.azurewebsites.net/

Examples


## Sample Q matrix
Q1 <- read.csv(system.file("auxdata", "miniPP-Q.csv", package="Peanut"),
                     stringsAsFactors=FALSE)

## Not run: 
library(PNetica) ## Needs PNetica
sess <- NeticaSession()
startSession(sess)
curd <- getwd()

netman1 <- read.csv(system.file("auxdata", "Mini-PP-Nets.csv", 
                                package="Peanut"),
                    row.names=1,stringsAsFactors=FALSE)

nodeman1 <- read.csv(system.file("auxdata", "Mini-PP-Nodes.csv", 
                                package="Peanut"),
                     row.names=1,stringsAsFactors=FALSE)

omegamat <- read.csv(system.file("auxdata", "miniPP-omega.csv",
                                package="Peanut"),
                     row.names=1,stringsAsFactors=FALSE)

## Insures we are building nets from scratch
setwd(tempdir())
## Network and node warehouse, to create networks and nodes on demand.
Nethouse <- BNWarehouse(manifest=netman1,session=sess,key="Name")

Nodehouse <- NNWarehouse(manifest=nodeman1,
                         key=c("Model","NodeName"),
                         session=sess)

## Build the proficiency model first:
CM <- WarehouseSupply(Nethouse,"miniPP_CM")
CM1 <- Omega2Pnet(omegamat,CM,Nodehouse,override=TRUE)

## Build the nets from the Qmat

Qmat2Pnet(Q1, Nethouse,Nodehouse)


## Build the Qmat from the nets
## Generate a list of nodes
obs <-unlist(sapply(list(sess$nets$PPcompEM,sess$nets$PPconjEM,
                  sess$nets$PPtwostepEM,sess$nets$PPdurAttEM),
             NetworkAllNodes))

Q2 <- Pnet2Qmat(obs,NetworkAllNodes(CM))

## adjust Q1 to match Q2
Q1 <- Q1[,-1]  ## Drop unused first column.
class(Q1) <- c("Qmat", "data.frame")
# Force them into the same order
Q1 <- Q1[order(Q1$Model,Q1$Node),]
Q2 <- Q2[order(Q2$Model,Q2$Node),]
row.names(Q1) <- NULL
row.names(Q2) <- NULL


## Force all NA columns into the right type
Q1$LinkScale <- as.numeric(Q1$LinkScale)
Q1$A.Physics <- as.numeric(Q1$A.Physics)
Q1$A.IterativeD <- as.numeric(Q1$A.IterativeD)
Q1$B.Physics <- as.numeric(Q1$B.Physics)
Q1$B.NTL <- as.numeric(Q1$B.NTL)

## Fix fancy quotes added by some spreadsheets
Q1$Rules <- gsub(intToUtf8(c(91,0x201C,0x201D,93)),"\"",Q1$Rules)

## Insert Default Prior Weights
Q1$PriorWeight <- ifelse(is.na(Q1$NStates),"","10")
all.equal(Q1,Q2)


stopSession(sess)
setwd(curd)

## End(Not run)

ralmond/Peanut documentation built on Sept. 19, 2023, 8:27 a.m.