equate-methods: IRT True Score and Observed Score Equating
In plink: IRT Separate Calibration Linking Methods

Description Usage Arguments Details Value Author(s) References Examples

This function conducts IRT true score and observed score equating for unidimensional single-format or mixed-format item parameters for two or more groups. This function supports all item response models available in plink with the exception of the multiple-choice model.

equate(x, method=c("TSE", "OSE"), true.scores, ts.low, base.grp=1, score=1, 
  startval, weights1, weights2, syn.weights, exclude, max.tse.iter, ...) 

## S4 method for signature 'list'
equate(x, method, true.scores, ts.low, base.grp, score, startval, 
  weights1, weights2, syn.weights, exclude, max.tse.iter, ...)

## S4 method for signature 'irt.pars', 'ANY'
equate(x, method, true.scores, ts.low, base.grp, score, startval, 
  weights1, weights2, syn.weights, exclude, max.tse.iter, ...)

`x`	an object of class `irt.pars` with one or more groups, a list containing two or more `irt.pars` objects (e.g., when there are no common items), or the output from `plink` containing rescaled item parameters.
`method`	character vector identifying the equating method(s) to use. Values can include `"TSE"` and `"OSE"`.
`true.scores`	numeric vector of true score values to be equated
`ts.low`	logical value. If TRUE, interpolate values for the equated true scores in the range of observed scores from one to the value below the lowest estimated true score (a rounded sum of guessing parameters)
`base.grp`	integer identifying the group for the base scale
`score`	if `score` = 1, score responses for the true-score equating method with zero for the lowest category and k-1 for the highest, k, category for each item. If `score` = 2, score responses with one for the lowest category and k for the highest, k, category for each item. A vector or list of scoring weights for each response category can be supplied, but this is only recommended for advanced users.
`startval`	integer starting value for the first value of `true.score`
`weights1`	list containing information about the theta values and weights to be used in the observed score equating for population 1. See below for more details.
`weights2`	list containing information about the theta values and weights to be used in the observed score equating for population 2. See below for more details.
`syn.weights`	vector of length two or a list containing vectors of length two with synthetic population weights to be used for each pair of tests for populations 1 and 2 respectively. If missing, weights of 0.5 will be used for both populations for all groups. If `syn.weights` is a list, there should be k-1 elements for k groups.
`exclude`	character vector or list identifying common items that should be excluded when estimating the linking constants. See below for more details.
`max.tse.iter`	maximum number of iterations to identify the theta value associated with each true score. The default is 50.
`...`	further arguments passed to or from other methods. See below for details.

weights1 can be a list or a list of lists. The purpose of this object is to specify the theta values for population 1 to integrate over in the observed score equating as well as any weights associated with the theta values. The function as.weight can be used to facilitate the creation of this object. If weights1 is missing, the default is to use equally spaced theta values ranging from -4 to 4 with an increment of 0.05 and normal density weights for all groups.

To better understand the elements of weights1, let us assume for a moment that x has parameters for only two groups. In this instance, weights1 would be a single list with length two. The first element should be a vector of theta values corresponding to points on the base scale. The second list element should be a vector of weights corresponding the theta values. If x contains more than two groups, a single weights1 object can be supplied, and the same set of thetas and weights will be used for all adjacent groups. However, a separate list of theta values and weights for each adjacent group in x can be supplied.

The specification of weights2 is the same as that for weights1, although the theta values and weights for this object correspond to theta values for population 2. This argument is only used when the synthetic weight associated with population 2 is greater than zero. If weights2 is missing, the same theta values and weights used for weights1 will be used for weights2.

For both equating methods, response probabilities are computed using the functions drm, grm, gpcm, and nrm for the associated models respectively. Various arguments from these functions can be passed to equate. Specifically, the argument incorrect can be passed to drm and catprob can be passed to grm. In the functions drm, grm, and gpcm there is an argument D for the value of a scaling constant. In plink, a single argument D can be passed that will be applied to all applicable models, or arguments D.drm, D.grm, and D.gpcm can be specified for each model respectively. If an argument is specified for D and, say D.drm, the values for D.grm and D.gpcm (if applicable) will be set equal to D. If only D.drm is specified, the values for D.grm and D.gpcm (if applicable) will be set to 1.

There are instances where certain items should not be included in the computation of total scores (e.g., when the common items correspond to an external anchor test or when using field test items) The exclude argument can be used to remove these items prior to conducting the equationg. exclude can be specified as a character vector or a list. In the former case, a single value "all.common" can be used to remove all common items or a vector of model names (i.e., "drm", "grm", "gpcm", "nrm", "mcm") can be supplied, indicating that any item on any test associated with the given model(s) would be excluded. If the argument is specified as a list, exclude should have as any elements as groups in x. Each list element can include model names and/or item numbers corresponding to the items on each test that should be excluded. If no items need to be excluded for a given group, the list element should be equal NA. For example, say we have two groups and we would like to exclude the GRM items and item 23 from the first group, we would specify exclude as exclude <- list(c("grm",23),NA).

Returns a matrix of equated true scores and/or a list of equated observed scores with associated marginal distributions or a list combining these two objects. The output for the observed-score equating also includes EAP scores and SDs for each of the observed scores (Thissen & Orlando, 2001).

Jonathan P. Weeks weeksjp@gmail.com

Kolen, M. J. (1981). Comparison of traditional and item response theory methods for equating tests. Journal of Educational Measurement, 18(1), 1-11.

Kolen, M. J. & Brennan, R. L. (2004) Test Equating, Scaling, and Linking (2nd ed.). New York: Springer

Thissen, D. & Orlando, M. (2001) Item response theory for items scored in two categories. In D. Thissen & H. Wainer (Eds.) Test Scoring (p. 23 - 72). Hillsdale, NJ: Lawrence Erlbaum Associates.

Weeks, J. P. (2010) plink: An R package for linking mixed-format tests using IRT-based methods. Journal of Statistical Software, 35(12), 1–33. URL http://www.jstatsoft.org/v35/i12/

# IRT true score and observed score examples from 
# Kolen & Brennan (2004, ch. 6)
pm <- as.poly.mod(36)
x <- as.irt.pars(KB04$pars, KB04$common,
  cat=list(rep(2,36),rep(2,36)), poly.mod=list(pm,pm))
out <- plink(x, rescale="MS", base.grp=2, D=1.7, exclude=list(27,NA))

# Create the quadrature points and weights
wt <- as.weight(
  theta=c(-5.2086,-4.163,-3.1175,-2.072,-1.0269,0.0184,
    1.0635,2.109,3.1546,4.2001),
  weight=c(0.000101,0.00276,0.03021,0.142,0.3149,0.3158,
    0.1542,0.03596,0.003925,0.000186))

# Conduct the equating
equate(out,weights1=wt, synth.weights=c(1,0),D=1.7)

# Conduct true score equating for specific true scores
equate(out, true.scores=7:15, ts.low=FALSE, D=1.7)

# Exclude all common items (assume they correspond to an external anchor)
equate(out, D=1.7, exclude="all.common")


# Observed score equating for mixed-format tests
pm1 <- as.poly.mod(55,c("drm","gpcm","nrm"),dgn$items$group1)
pm2 <- as.poly.mod(55,c("drm","gpcm","nrm"),dgn$items$group2)
x <- as.irt.pars(dgn$pars,dgn$common,dgn$cat,list(pm1,pm2))
out <- plink(x, rescale="HB") 
OSE <- equate(out, method="OSE", score=2)

# Display the equated scores
OSE[[1]]

# Multiple group equating
pars <- TK07$pars
common <- TK07$common
cat <- list(rep(2,26),rep(2,34),rep(2,37),rep(2,40),rep(2,41),rep(2,43))
pm1 <- as.poly.mod(26)
pm2 <- as.poly.mod(34)
pm3 <- as.poly.mod(37)
pm4 <- as.poly.mod(40)
pm5 <- as.poly.mod(41)
pm6 <- as.poly.mod(43)
pm <- list(pm1, pm2, pm3, pm4, pm5, pm6)
x <- as.irt.pars(pars, common, cat, pm, grp.names=paste("grade",3:8,sep=""))
out <- plink(x, rescale="SL")


# True score equating
equate(out, method="TSE")

# True score equating with the base group changed to 3
equate(out, method="TSE", base.grp=3)

# Observed score equating (These data are for non-equivalent groups, but
# this example is included to illustrate the multigroup capabilities)
OSE <- equate(out, method="OSE", base.grp=3)

# Display the equated scores for each group
OSE[[1]]