dichoDif: Comparison of DIF detection methods

Description Usage Arguments Details Value Author(s) References See Also Examples


This function compares the specified DIF detection methods with respect to the detected items.


dichoDif(Data, group, focal.name, method, anchor = NULL, props = NULL, 
 	thrTID = 1.5, alpha = 0.05, MHstat = "MHChisq", correct = TRUE, 
 	exact = FALSE, stdWeight = "focal", thrSTD = 0.1, BDstat = "BD", 
 	member.type = "group", match = "score", type = "both", criterion = "LRT", 
 	model = "2PL", c = NULL, engine = "ltm", discr = 1, irtParam = NULL, 
 	same.scale = TRUE, signed = FALSE, purify = FALSE, purType = "IPP1",
 	nrIter = 10, extreme = "constraint", const.range = c(0.001, 0.999), 
 	nrAdd = 1, p.adjust.method = NULL, save.output = FALSE,
 	output = c("out", "default")) 
## S3 method for class 'dichoDif'
print(x, ...)



numeric: either the data matrix only, or the data matrix plus the vector of group membership. See Details.


numeric or character: either the vector of group membership or the column indicator (within Data) of group membership. See Details.


numeric or character indicating the level of group which corresponds to the focal group.


character: the name of the selected method. Possible values are "TID", "MH", "Std", "Logistic", "BD", "SIBTEST", "Lord", "Raju" and "LRT". See Details.


either NULL (default) or a vector of item names (or identifiers) to specify the anchor items. See Details.


either NULL (default) or a two-column matrix with proportions of success in the reference group and the focal group. See Details.


numeric: the threshold for detecting DIF items with TID method (default is 1.5).


numeric: significance level (default is 0.05).


character: specifies the DIF statistic to be used for DIF identification. Possible values are "MHChisq" (default) and "logOR". See Details.


logical: should the Mantel-Haenszel continuity correction be used? (default is TRUE).


logical: should an exact test be computed? (default is FALSE).


character: the type of weights used for the standardized P-DIF statistic. Possible values are "focal" (default), "reference" and "total". See Details.


numeric: the threshold (cut-score) for standardized P-DIF statistic (default is 0.10).


character specifying the DIF statistic to be used. Possible values are "BD" (default) and "trend". See Details.


character: either "group" (default) to specify that group membership is made of two groups, or "cont" to indicate that group membership is based on a continuous criterion. See Details.


specifies the type of matching criterion. Can be either "score" (default) to compute the test score, or any continuous or discrete variable with the same length as the number of rows of Data. See Details.


a character string specifying which DIF effects must be tested. Possible values are "both" (default), "udif" and "nudif". See Details.


a character string specifying which DIF statistic is computed. Possible values are "LRT" (default) or "Wald". See Details.


character: the IRT model to be fitted (either "1PL", "2PL" or "3PL"). Default is "2PL".


optional numeric value or vector giving the values of the constrained pseudo-guessing parameters. See Details.


character: the engine for estimating the 1PL model, either "ltm" (default) or "lme4".


either NULL or a real positive value for the common discrimination parameter (default is 1). Used onlky if model is "1PL" and engine is "ltm". See Details.


matrix with 2J rows (where J is the number of items) and at most 9 columns containing item parameters estimates. See Details.


logical: are the item parameters of the irtParam matrix on the same scale? (default is "TRUE"). See Details.


logical: should the Raju's statistics be computed using the signed (TRUE) or unsigned (FALSE, default) area? See Details.


logical: should the method be used iteratively to purify the set of anchor items? (default is FALSE).


character: the type of purification process to be run. Possible values are "IPP1" (default), "IPP2" and "IPP3". Ignored if purify is FALSE or method does not supply the "TID" method.


numeric: the maximal number of iterations in the item purification process (default is 10).


character: the method used to modify the extreme proportions. Possible values are "constraint" (default) or "add". Ignored if method is not "TID".


numeric: a vector of two constraining proportions. Default values are 0.001 and 0.999. Ignored if method is not "TID" or if extreme is "add".


integer: the number of successes and the number of failures to add to the data in order to adjust the proportions. Default value is 1. Ignored if method is not "TID" or if extreme is "constraint".


either NULL (default) or the acronym of the method for p-value adjustment for multiple comparisons. See Details.


logical: should the output be saved into a text file? (Default is FALSE).


character: a vector of two components. The first component is the name of the output file, the second component is either the file path or "default" (default value). See Details.


result from a dichoDif class object.


other generic parameters for the print function.


dichoDif is a generic function which calls one or several DIF detection methods and summarize their output. The possible methods are:

  1. "TID" for Transformed Item Difficulties (TID) method (Angoff and Ford, 1973),

  2. "MH" for mantel-Haenszel (Holland and Thayer, 1988),

  3. "Std" for standardization (Dorans and Kulick, 1986),

  4. "BD" for Breslow-Day method (Penfield, 2003),

  5. "Logistic" for logistic regression (Swaminathan and Rogers, 1990),

  6. "SIBTEST" for SIBTEST (Shealy and Stout) and Crossing-SIBTEST (Chalmers, 2018; Li and Stout, 1996) methods,

  7. "Lord" for Lord's chi-square test (Lord, 1980),

  8. "Raju" for Raju's area method (Raju, 1990), and

  9. "LRT" for likelihood-ratio test method (Thissen, Steinberg and Wainer, 1988).

If method has a single component, the output of dichoDif is exactly the one provided by the method itself. Otherwise, the main output is a matrix with one row per item and one column per method. For each specified method and related arguments, items detected as DIF and non-DIF are respectively encoded as "DIF" and "NoDIF". When printing the output an additional column is added, counting the number of times each item was detected as functioning differently (Note: this is just an informative summary, since the methods are obviously not independent for the detection of DIF items).

The Data is a matrix whose rows correspond to the subjects and columns to the items. In addition, Data can hold the vector of group membership. If so, group indicates the column of Data which corresponds to the group membership, either by specifying its name or by giving the column number. Otherwise, group must be a vector of same length as nrow(Data).

Missing values are allowed for item responses (not for group membership) but must be coded as NA values. They are discarded from either the computation of the sum-scores, the fitting of the logistic models or the IRT models (according to the method).

The vector of group membership must hold only two different values, either as numeric or character. The focal group is defined by the value of the argument focal.name.

For "MH", "Std", "Logistic" and "BD" methods, the matching criterion can be either the test score or any other continuous or discrete variable to be passed in the Logistik function. This is specified by the match argument. By default, it takes the value "score" and the test score (i.e. raw score) is computed. The second option is to assign to match a vector of continuous or discrete numeric values, which acts as the matching criterion. Note that for consistency this vector should not belong to the Data matrix.

For Lord and Raju methods, one can specify either the IRT model to be fitted (by means of model, c, engine and discr arguments), or the item parameter estimates with arguments irtParam and same.scale. See difLord and difRaju for further details.

The threshold for detecting DIF items depends on the method. For standardization it has to be fully specified (with the thr argument), as well as for the TID method (through the thrTID argument). For the other methods it is depending on the significance level set by alpha.

For Mantel-Haenszel method, the DIF statistic can be either the Mantel-Haenszel chi-square statistic or the log odds-ratio statistic. The method is specified by the argument MHstat, and the default value is "MHChisq" for the chi-square statistic. Moreover, the option correct specifies whether the continuity correction has to be applied to Mantel-Haenszel statistic. See difMH for further details.

By default, the asymptotic Mantel-Haenszel statistic is computed. However, the exact statistics and related P-values can be obtained by specifying the logical argument exact to TRUE. See Agresti (1990, 1992) for further details about exact inference.

The weights for computing the standardized P-DIF statistics are defined through the argument stdWeight, with possible values "focal" (default value), "reference" and "total". See stdPDIF for further details.

For Breslow-Day method, two test statistics are available: the usual Breslow-Day statistic for testing homogeneous association (Aguerri, Galibert, Attorresi and Maranon, 2009) and the trend test statistic for assessing some monotonic trend in the odss ratios (Penfield, 2003). The DIF statistic is supplied by the BDstat argument, with values "BD" (default) for the usual statistic and "trend" for the trend test statistic.

For logistic regression, the argument type permits to test either both uniform and nonuniform effects simultaneously (type="both"), only uniform DIF effect (type="udif") or only nonuniform DIF effect (type="nudif"). The criterion argument specifies the DIF statistic to be computed, either the likelihood ratio test statistic (by setting criterion="LRT") or the Wald test (by setting criterion="Wald"). Moreover, the group membership can be either a vector of two distinct values, one for the reference group and one for the focal group, or a continuous or discrete variable that acts as the "group" membership variable. In the former case, the member.type argument is set to "group" and the focal.name defines which value in the group variable stands for the focal group. In the latter case, member.type is set to "cont", focal.name is ignored and each value of the group represents one "group" of data (that is, the DIF effects are investigated among participants relying on different values of some discrete or continuous trait). See Logistik for further details.

The SIBTEST method (Shealy and Stout, 1993) and its modified version, the Crossing-SIBTEST (Chalmers, 2018; Li and Stout, 1996) are returned by the difSIBTEST function. SIBTEST method is returned when type argument is set to "udif", while Crossing-SIBTEST is set with "nudif" value for the type argument. Note that type takes the by-default value "both" which is not allowed within the difSIBTEST function; however, within this fucntion, keeping the by-default value yields selection of Crossing-SIBTEST.

The difSIBTEST function is a wrapper to the SIBTEST function from the mirt package (Chalmers, 2012) to fit within the difR framework (Magis et al., 2010). Therefore, if you are using this function for publication purposes please cite Chalmers (2018; 2012) and Magis et al. (2010).

For Raju's method, the type of area (signed or unsigned) is fixed by the logical signed argument, with default value FALSE (i.e. unsigned areas). See RajuZ for further details.

Item purification can be requested by specifying purify option to TRUE. Recall that item purification process is slightly different for IRT and for non-IRT based methods. See the corresponding methods for further information.

Adjustment for multiple comparisons is possible with the argument p.adjust.method. See the corresponding methods for further information.

A pre-specified set of anchor items can be provided through the anchor argument. For non-IRT methods, anchor items are used to compute the test score (as matching criterion). For IRT methods, anchor items are used to rescale the item parameters on a common metric. See the corresponding methods for further information. Note that anchor argument is not working with "LRT" method.

The output of the dichoDif function can be stored in a text file by fixing save.output and output appropriately. See the help file of selectDif function (or any other DIF method) for further information.


Either the output of one of the DIF detection methods, or a list of class "dichoDif" with the following arguments:


a character matrix with one row per item and whose columns refer to the different specified detection methods. See Details.


the value of the props argument.


the value of the thrTID argument.


the value of correct argument.


the value of exact argument.


the significance level alpha.


the value of the MHstat argument.


the value of the stdWeight argument.


the value of thrSTD argument.


the value of the BDstat argument.


the value of the member.type argument.


the value of the match argument.


the value of the type argument.


the value of the criterion argument.


the value of model argument.


the value of c argument.


The value of the engine argument.


the value of the discr argument.


the value of irtParam argument.


the value of same.scale argument.


the value of the p.adjust.method argument.


the value of purify argument.


an integer vector (of length equal to the number of methods) with the number of iterations in the purification process. Returned only if purify is TRUE.


a logical vector (of length equal to the number of methods) indicating whether the iterative purification process converged. Returned only if purify is TRUE.


the value of the anchor argument.


the value of the save.output argument.


the value of the output argument.


Sebastien Beland
Collectif pour le Developpement et les Applications en Mesure et Evaluation (Cdame)
Universite du Quebec a Montreal
sebastien.beland.1@hotmail.com, http://www.cdame.uqam.ca/
David Magis
Department of Psychology, University of Liege
Research Group of Quantitative Psychology and Individual Differences, KU Leuven
David.Magis@uliege.be, http://ppw.kuleuven.be/okp/home/
Gilles Raiche
Collectif pour le Developpement et les Applications en Mesure et Evaluation (Cdame)
Universite du Quebec a Montreal
raiche.gilles@uqam.ca, http://www.cdame.uqam.ca/


Agresti, A. (1990). Categorical data analysis. New York: Wiley.

Agresti, A. (1992). A survey of exact inference for contingency tables. Statistical Science, 7, 131-177. doi: 10.1214/ss/1177011454

Aguerri, M.E., Galibert, M.S., Attorresi, H.F. and Maranon, P.P. (2009). Erroneous detection of nonuniform DIF using the Breslow-Day test in a short test. Quality and Quantity, 43, 35-44. doi: 10.1007/s11135-007-9130-2

Angoff, W. H., and Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 2, 95-106. doi: 10.1111/j.1745-3984.1973.tb00787.x

Chalmers, R. P. (2012). mirt: A Multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. doi: 10.18637/jss.v048.i06

Chalmers, R. P. (2018). Improving the Crossing-SIBTEST statistic for detecting non-uniform DIF. Psychometrika, 83(2), 376–386. doi: 10.1007/s11336-017-9583-8

Dorans, N. J. and Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355-368. doi: 10.1111/j.1745-3984.1986.tb00255.x

Holland, P. W. and Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer and H. I. Braun (Dirs.), Test validity. Hillsdale, NJ: Lawrence Erlbaum Associates.

Li, H.-H., and Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647–677. doi: 10.1007/BF02294041

Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.

Magis, D., Beland, S., Tuerlinckx, F. and De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847-862. doi: 10.3758/BRM.42.3.847

Penfield, R.D. (2003). Application of the Breslow-Day test of trend in odds ratio heterogeneity to the detection of nonuniform DIF. Alberta Journal of Educational Research, 49, 231-243.

Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197-207. doi: 10.1177/014662169001400208

Shealy, R. and Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detect test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194. doi: 10.1007/BF02294572

Swaminathan, H. and Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370. doi: 10.1111/j.1745-3984.1990.tb00754.x

Thissen, D., Steinberg, L. and Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer and H. Braun (Eds.), Test validity. Hillsdale, NJ: Lawrence Erlbaum Associates.

See Also

difTID, difMH, difStd, difBD, difLogistic, difSIBTEST, difLord, difRaju, difLRT


## Not run: 

 # Loading of the verbal data

 # Excluding the "Anger" variable
 verbal <- verbal[colnames(verbal)!="Anger"]

 # Comparing TID, Mantel-Haenszel, standardization; logistic regression and SIBTEST
 # TID threshold 1.0
 # Standardization threshold 0.08
 # no continuity correction,
 # with item purification
 # both types of DIF effect for logistic regression
 # CSIBTEST method 
 dichoDif(verbal, group = 25, focal.name = 1, method = c("TID", "MH", "Std",
          "Logistic", "SIBTEST"), correct = FALSE, thrSTD = 0.08, thrTID = 1, purify = TRUE)

 # Same analysis, but using items 1 to 5 as anchor and saving the output into 
 # the 'dicho' file 
 dichoDif(verbal, group = 25, focal.name = 1, method = c("TID", "MH", "Std",
          "Logistic"), correct = FALSE, thrSTD = 0.08, thrTID = 1, purify = TRUE, 
          anchor = 1:5,save.output = TRUE, output = c("dicho", "default"))

 # Comparing Lord and Raju results with 2PL model and
 # with item purification 
 dichoDif(verbal, group = 25, focal.name = 1, method = c("Lord", "Raju"),
          model = "2PL", purify = TRUE)

## End(Not run)

difR documentation built on July 2, 2020, 3:34 a.m.