calcDPCTable: Creates the probability table for the discrete partial credit...
In ralmond/CPTtools: Tools for Creating Conditional Probability Tables

calcDPCTable

R Documentation

Creates the probability table for the discrete partial credit model

Description

The calcDPCTable function takes a description of input and output variables for a Bayesian network distribution and a collection of IRT-like parameter (discrimination, difficulty) and calculates a conditional probability table using the discrete partial credit distribution (see Details). The calcDPCFrame function returns the value as a data frame with labels for the parent states.

Usage

calcDPCTable(skillLevels, obsLevels, lnAlphas, betas,
             rules = "Compensatory", link="partialCredit",
             linkScale=NULL, Q=TRUE,
             tvals=lapply(skillLevels,
               function (sl) effectiveThetas(length(sl))))

calcDPCFrame(skillLevels, obsLevels, lnAlphas, betas, 
             rules = "Compensatory", link="partialCredit", 
             linkScale=NULL, Q=TRUE,
             tvals=lapply(skillLevels, 
               function (sl) effectiveThetas(length(sl))))

Arguments

`skillLevels`	A list of character vectors giving names of levels for each of the condition variables.
`obsLevels`	A character vector giving names of levels for the output variables from highest to lowest. As a special case, can also be a vector of integers.
`lnAlphas`	A list of vectors of log slope parameters. Its length should be 1 or `length(obsLevels)-1`. The required length of the individual component vectors depends on the choice of `rule` (and is usually either 1 or the length of `skillLevels`).
`betas`	A list of vectors of difficulty (-intercept) parameters. Its length should be 1 or `length(obsLevels)-1`. The required length of the individual component vectors depends on the choice of `rule` (and is usually either 1 or the length of `skillLevels`).
`rules`	A list of functions for computing effective theta (see Details). Its length should be `length(obsLevels)-1` or 1 (implying that the same rule is applied for every gap.)
`link`	The function that converts a table of effective thetas to probabilities
`linkScale`	An optional scale parameter for the `link` function. This is only used with certain choices of `link` function.
`Q`	This should be a Q matrix indicating which parent variables are relevant for which state transitions. It should be a number of states minus one by number of parents logical matrix. As a special case, if all variable are used for all levels, then it can be a scalar value.
`tvals`	A list of the same length as `skillLevels`. Each element should be a numeric vector values on the theta (logistic) scale corresponding to the levels for that parent variable. The default spaces them equally according to the normal distribution (see `effectiveThetas`).

Details

The discrete graded response model is a generalization of the DiBello–Samejima mechanism for creating conditional probability tables for Bayesian network models using IRT-like parameters (calcDSTable). The basic procedure unfolds in three steps.

Each level of each input variable is assigned an “effective theta” value — a normal value to be used in calculations.
For each possible skill profile (combination of states of the parent variables) the effective thetas are combined using a one of the rule functions. This produces an “effective theta” for that skill profile.
The effective theta table is input into the link function to produce a probability distribution over the states of the outcome variables.

The parent (conditioning) variables are described by the skillLevels argument which should provide for each parent variable in order the names of the states ranked from highest to lowest value. The default implementation uses the function effectiveThetas to calculate equally spaced points on the normal curve. This can be overridden by supplying a tvals argument. This should be a list of the same length as skillLevels with each element having the same length as the corresponding element of skillLevels.

The tvals (either default or user supplied) are used to create a table of rows with values \theta_1,\ldots,\theta_K, corresponding to all possible combinations of the parent variables (using expand.grid).

Let X be the child variable of the distribution, and assume that it can take on M possible states labeled x_1 through x_M in increasing order. (Note: that calcDPCTable assumes variable states are ordered the other direction: from highest to lowest.) For each state but the lowest state (the last one in the input order) defines a combination rule Z_m(\theta_1,\ldots,\theta_K;alphas,betas). Applying these functions to the rows of the table produces a table of effective thetas for each configuration of the parent variables and each child state except for the lowest. (The metaphor is this theta represents the “ability level” required to reach that output state.)

Note that the Z_m(\cdot)s do not need to have the same parameters or even the same functional form. The argument rules should contain a list of the names of the combination functions, the first one corresponding to Z_M(\cdot), and so forth in descending order. As a special case, if rules has only one element, than it is used for all of the transitions. Similarly, the lnAlphas and betas should also be lists of the parameters of the combination functions corresponding to the transitions between the levels. The betas[[m]] represent difficulties (negative intercepts) and the exp(lnAlphas[[m]]) represent slopes for the transition to level m (following the highest to lowest order). Again if these lists have length one, the same value is used for all transitions.

The length of the elements of lnAlphas and betas is determined by the specific choice of combination function. The functions Compensatory, Conjunctive, and Disjunctive all assume that there will be one lnAlpha for each parent variable, but a single beta. The functions OffsetConjunctive, and OffsetDisjunctive both assume that there will be one beta for each parent variable, but a single lnAlpha.

The code link function is then applied to the table of effective theta values to produce a conditional probability distribution. Two link functions are currently supported: partialCredit is based on the generalized partial credit model (Muraki, 1992), gradedResponse is a modified version of the graded response model (Samejima, 1969). (The modification corrects for problems when the curves cross.) A third planned link function is based on a normal error model, this will require the extra linkScale parameter.

The Q matrix is used in situations where some of the parent variables are not relevant for one or more parent transitions. If parent k is relevant for the transition between state m+1 and m (remember that states are coded from highest to lowest) then Q[m,k] should be TRUE. In particular, eTheta[,Q[m,]] is passed to the combination rule, not all of theta. If there are false entries in Q the corresponding sets of alphas and betas need to have the correct length. Generally speaking, Q matrixes with FALSE entries are not appropriate with the gradedResponse link. As a special case if Q=TRUE, then all parent variables are used for all state transitions.

Normally obslevel should be a character vector giving state names. However, in the special case of state names which are integer values, R will “helpfully” convert these to legal variable names by prepending a letter. This causes other functions which rely on the names() of the result being the state names to break. As a special case, if the value of obsLevel is of type numeric, then calcDSFrame() will make sure that the correct values are preserved.

Value

For calcDPCTable, a matrix whose rows correspond configurations of the parent variable states (skillLevels) and whose columns correspond to obsLevels. Each row of the table is a probability distribution, so the whole matrix is a conditional probability table. The order of the parent rows is the same as is produced by applying expand.grid to skillLevels.

For calcDPCFrame a CPF, a data frame with additional columns corresponding to the entries in skillLevels giving the parent value for each row.

Note

The framework set up by this function is completely expandable. The link and the elements of rules can be any value that is suitable for the first argument of do.call.

Elements of rules are called with the expression do.call(rules[[kk]],list(thetas,exp(lnAlphas[[kk]]),betas[[kk]])) where thetas is the matrix of effective theta values produced in the first step of the algorithm, and the return function should be a vector of effective thetas, one for each row of thetas.

The link function is called with the expression do.call(link,list(et,linkScale,obsLevels)) where et is the matrix of effective thetas produced in the second step. It should return a conditional probability table with the same number of rows and one more column than et. All of the rows should sum to 1.0.

Author(s)

Russell Almond

References

Almond, R.G. (2015). An IRT-based Parameterization for Conditional Probability Tables. Paper submitted to the 2015 Bayesian Application Workshop at the Uncertainty in Artificial Intelligence conference.

Almond, R.G., Mislevy, R.J., Steinberg, L.S., Williamson, D.M. and Yan, D. (2015) Bayesian Networks in Educational Assessment. Springer. Chapter 8.

Muraki, E. (1992). A Generalized Partial Credit Model: Application of an EM Algorithm. Applied Psychological Measurement, 16, 159-176. DOI: 10.1177/014662169201600206

Samejima, F. (1969) Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17, 34, (No. 4, Part 2).

Examples

## Set up variables
skill1l <- c("High","Medium","Low") 
skill2l <- c("High","Medium","Low","LowerYet") 
correctL <- c("Correct","Incorrect") 
pcreditL <- c("Full","Partial","None")
gradeL <- c("A","B","C","D","E") 

## Simple binary model, these three should be the same.
cptCorrect <- calcDPCTable(list(S1=skill1l,S2=skill2l),correctL,
                          log(c(S1=1,S2=.75)),1.0,rule="Compensatory",
                          link="partialCredit")
cptCorrect2 <- calcDPCTable(list(S1=skill1l,S2=skill2l),correctL,
                          log(c(S1=1,S2=.75)),1.0,rule="Compensatory",
                          link="gradedResponse")
cptCorrect1 <- calcDSTable(list(S1=skill1l,S2=skill2l),correctL,
                          log(c(S1=1,S2=.75)),1.0,rule="Compensatory")
stopifnot (all (abs(cptCorrect2-cptCorrect1) <.001))
stopifnot (all (abs(cptCorrect-cptCorrect1) <.001))

## Conjunctive uses multiple betas, not multiple alphas.
cptConj <- calcDPCTable(list(S1=skill1l,S2=skill2l),correctL,
                        log(1),c(S1=0.5,S2=.7),rule="OffsetConjunctive")

## Test for no parent case
cptTheta <- calcDPCTable(list(),skill1l,numeric(),0,rule="Compensatory",
                         link="normalLink",linkScale=.5)
cpfTheta <- calcDPCFrame(list(),skill1l,numeric(),0,rule="Compensatory",
                         link="normalLink",linkScale=.5)


## Simple model, Skill 1 needed for step 1, Skill 2 for Step 2.
cptPC1 <- calcDPCFrame(list(S1=skill1l,S2=skill2l),pcreditL,
                        lnAlphas=log(1),
                        betas=list(full=c(S1=0,S2=999),partial=c(S2=999,S2=0)),
                        rule="OffsetDisjunctive")
##Variant using Q-matrix
cptPC1a <- calcDPCTable(list(S1=skill1l,S2=skill2l),pcreditL,
                        lnAlphas=log(1),
                        betas=list(full=c(S1=0),partial=c(S2=0)),
                        Q=matrix(c(TRUE,FALSE,FALSE,TRUE),2,2),
                        rule="OffsetDisjunctive")
stopifnot(all(abs(as.vector(numericPart(cptPC1))-as.vector(cptPC1a))<.0001))


## Complex model, different rules for different levels
cptPC2 <- calcDPCTable(list(S1=skill1l,S2=skill2l),pcreditL,
                          list(full=log(1),partial=log(c(S1=1,S2=.75))),
                          betas=list(full=c(0,999),partial=1.0),
                          rule=list("OffsetDisjunctive","Compensatory"))

## Graded Response Model, typically uses different difficulties
cptGraded <- calcDPCTable(list(S1=skill1l),gradeL,
                          log(1),betas=list(A=2,B=1,C=0,D=-1),
                          rule="Compensatory",link="gradedResponse")

## Partial credit link is somewhat different
cptPC5 <- calcDPCTable(list(S1=skill1l),gradeL,
                          log(1),betas=list(A=2,B=1,C=0,D=-1),
                          rule="Compensatory",link="partialCredit")
cptPC5a <- calcDPCTable(list(S1=skill1l),gradeL,
                          log(1),betas=1,
                          rule="Compensatory",link="partialCredit")

## Need to be careful when using different slopes (or non-increasing
## difficulties) with graded response link as curves may cross.

cptCross <- calcDPCTable(list(S1=skill1l),pcreditL,
                          log(1),betas=list(full=-1,partial=1),
                          rule="Compensatory",link="gradedResponse")
stopifnot (all(abs(cptCross[,"Partial"])<.001))

ralmond/CPTtools documentation built on Dec. 27, 2024, 7:15 a.m.