selectDif: Selection of one of the DIF detection methods

Description Usage Arguments Details Value Author(s) References See Also Examples


This function performs DIF detection for one pre-specified method.


selectDif(Data, group,, method, anchor = NULL, props = NULL, 
 	thrTID = 1.5, alpha = 0.05, MHstat = "MHChisq", correct = TRUE, 
 	exact = FALSE, stdWeight = "focal", thrSTD = 0.1, BDstat = "BD", 
 	member.type = "group", match = "score", type = "both", criterion = "LRT", 
 	model = "2PL", c = NULL, engine = "ltm", discr = 1, irtParam = NULL, 
 	same.scale = TRUE, signed = FALSE, purify = FALSE, purType = "IPP1", 
 	nrIter = 10, extreme = "constraint", const.range = c(0.001, 0.999), 
 	nrAdd = 1, p.adjust.method = NULL, save.output = FALSE, 
 	output = c("out", "default"))



numeric: either the data matrix only, or the data matrix plus the vector of group membership. See Details.


numeric or character: either the vector of group membership or the column indicator (within data) of group membership. See Details.

numeric or character indicating the level of group which corresponds to the focal group.


character: the name of the selected method. Possible values are "TID", "MH", "Std", "Logistic", "BD", "SIBTEST", "Lord", "Raju" and "LRT". See Details.


either NULL (default) or a vector of item names (or identifiers) to specify the anchor items. See Details.


either NULL (default) or a two-column matrix with proportions of success in the reference group and the focal group. See Details .


numeric: the threshold for detecting DIF items with TID method (default is 1.5).


numeric: significance level (default is 0.05).


character: specifies the DIF statistic to be used for DIF identification. Possible values are "MHChisq" (default) and "logOR". See Details .


logical: should the continuity correction be used? (default is TRUE).


logical: should an exact test be computed? (default is FALSE).


character: the type of weights used for the standardized P-DIF statistic. Possible values are "focal" (default), "reference" and "total". See Details.


numeric: the threshold (cut-score) for standardized P-DIF statistic (default is 0.10).


character specifying the DIF statistic to be used. Possible values are "BD" (default) and "trend". See Details.


character: either "group" (default) to specify that group membership is made of two groups, or "cont" to indicate that group membership is based on a continuous criterion. See Details.


specifies the type of matching criterion. Can be either "score" (default) to compute the test score, or any continuous or discrete variable with the same length as the number of rows of Data. See Details.


a character string specifying which DIF effects must be tested. Possible values are "both" (default), "udif" and "nudif". See Details.


a character string specifying which DIF statistic is computed. Possible values are "LRT" (default) or "Wald". See Details.


character: the IRT model to be fitted (either "1PL", "2PL" or "3PL"). Default is "2PL".


optional numeric value or vector giving the values of the constrained pseudo-guessing parameters. See Details.


character: the engine for estimating the 1PL model, either "ltm" (default) or "lme4".


either NULL or a real positive value for the common discrimination parameter (default is 1). Used onlky if model is "1PL" and engine is "ltm". See Details.


matrix with 2J rows (where J is the number of items) and at most 9 columns containing item parameters estimates. See Details.


logical: are the item parameters of the irtParam matrix on the same scale? (default is "TRUE"). See Details.


logical: should the Raju's statistics be computed using the signed (TRUE) or unsigned (FALSE, default) area? See Details.


logical: should the method be used iteratively to purify the set of anchor items? (default is FALSE).


character: the type of purification process to be run. Possible values are "IPP1" (default), "IPP2" and "IPP3". Ignored if purify is FALSE or if method is not "TID".


numeric: the maximal number of iterations in the item purification process (default is 10).


character: the method used to modify the extreme proportions. Possible values are "constraint" (default) or "add". Ignored if method is not "TID".


numeric: a vector of two constraining proportions. Default values are 0.001 and 0.999. Ignored if method is not "TID" or if extreme is "add".


integer: the number of successes and the number of failures to add to the data in order to adjust the proportions. Default value is 1. Ignored if method is not "TID" or if extreme is "constraint".


either NULL (default) or the acronym of the method for p-value adjustment for multiple comparisons. See Details.


logical: should the output be saved into a text file? (Default is FALSE).


character: a vector of two components. The first component is the name of the output file, the second component is either the file path or "default" (default value). See Details.


This is a generic function which calls one of the DIF detection methods and displays its output. It is mainly used as a routine for dichoDif command.

The possible methods are:

  1. "TID" for Transformed Item Difficulties (TID) method (Angoff and Ford, 1973),

  2. "MH" for mantel-Haenszel (Holland and Thayer, 1988),

  3. "Std" for standardization (Dorans and Kulick, 1986),

  4. "BD" for Breslow-Day method (Penfield, 2003),

  5. "Logistic" for logistic regression (Swaminathan and Rogers, 1990),

  6. "SIBTEST" for SIBTEST (Shealy and Stout) and Crossing-SIBTEST (Chalmers, 2018; Li and Stout, 1996) methods,

  7. "Lord" for Lord's chi-square test (Lord, 1980),

  8. "Raju" for Raju's area method (Raju, 1990), and

  9. "LRT" for likelihood-ratio test method (Thissen, Steinberg and Wainer, 1988).

The Data is a matrix whose rows correspond to the subjects and columns to the items. In addition, Data can hold the vector of group membership. If so, group indicates the column of Data which corresponds to the group membership, either by specifying its name or by giving the column number. Otherwise, group must be a vector of same length as nrow(Data).

Missing values are allowed for item responses (not for group membership) but must be coded as NA values. They are discarded from either the computation of the sum-scores, the fitting of the logistic models or the IRT models (according to the method).

The vector of group membership must hold only two different values, either as numeric or character. The focal group is defined by the argument

For "MH", "Std", "Logistic" and "BD" methods, the matching criterion can be either the test score or any other continuous or discrete variable to be passed in the selected DIF function. This is specified by the match argument. By default, it takes the value "score" and the test score (i.e. raw score) is computed. The second option is to assign to match a vector of continuous or discrete numeric values, which acts as the matching criterion. Note that for consistency this vector should not belong to the Data matrix.

For Lord and Raju methods, one can specify either the IRT model to be fitted (by means of model, c, engine and discr arguments), or the item parameter estimates with arguments irtParam and same.scale. See difLord and difRaju for further details.

The threshold for detecting DIF items depends on the method. For standardization it has to be fully specified (with the thr argument), as well as for the TID method (through the thrTID argument). For the other methods it is depending on the significance level set by alpha.

For Mantel-Haenszel method, the DIF statistic can be either the Mantel-Haenszel chi-square statistic or the log odds-ratio statistic. The method is specified by the argument MHstat, and the default value is "MHChisq" for the chi-square statistic. Moreover, the option correct specifies whether the continuity correction has to be applied to Mantel-Haenszel statistic. See difMH for further details.

By default, the asymptotic Mantel-Haenszel statistic is computed. However, the exact statistics and related P-values can be obtained by specifying the logical argument exact to TRUE. See Agresti (1990, 1992) for further details about exact inference.

The weights for computing the standardized P-DIF statistics are defined through the argument stdWeight, with possible values "focal" (default value), "reference" and "total". See stdPDIF for further details.

For Breslow-Day method, two test statistics are available: the usual Breslow-Day statistic for testing homogeneous association (Aguerri, Galibert, Attorresi and Maranon, 2009) and the trend test statistic for assessing some monotonic trend in the odds ratios (Penfield, 2003). The DIF statistic is supplied by the BDstat argument, with values "BD" (default) for the usual statistic and "trend" for the trend test statistic.

The SIBTEST method (Shealy and Stout, 1993) and its modified version, the Crossing-SIBTEST (Chalmers, 2018; Li and Stout, 1996) are returned by the difSIBTEST function. SIBTEST method is returned when type argument is set to "udif", while Crossing-SIBTEST is set with "nudif" value for the type argument. Note that type takes the by-default value "both" which is not allowed within the difSIBTEST function; however, within this fucntion, keeping the by-default value yields selection of Crossing-SIBTEST.

The difSIBTEST function is a wrapper to the SIBTEST function from the mirt package (Chalmers, 2012) to fit within the difR framework (Magis et al., 2010). Therefore, if you are using this function for publication purposes please cite Chalmers (2018; 2012) and Magis et al. (2010).

For logistic regression, the argument type permits to test either both uniform and nonuniform effects simultaneously (type="both"), only uniform DIF effect (type="udif") or only nonuniform DIF effect (type="nudif"). The criterion argument specifies the DIF statistic to be computed, either the likelihood ratio test statistic (with criterion="LRT") or the Wald test (with criterion="Wald"). Moreover, the group membership can be either a vector of two distinct values, one for the reference group and one for the focal group, or a continuous or discrete variable that acts as the "group" membership variable. In the former case, the member.type argument is set to "group" and the defines which value in the group variable stands for the focal group. In the latter case, member.type is set to "cont", is ignored and each value of the group represents one "group" of data (that is, the DIF effects are investigated among participants relying on different values of some discrete or continuous trait). See Logistik for further details.

For Raju's method, the type of area (signed or unsigned) is fixed by the logical signed argument, with default value FALSE (i.e. unsigned areas). See RajuZ for further details.

Item purification can be requested by specifying purify option to TRUE. Recall that item purification is slightly different for IRT and for non-IRT based methods. See the corresponding methods for further information.

Adjustment for multiple comparisons is possible with the argument p.adjust.method. See the corresponding methods for further information.

A pre-specified set of anchor items can be provided through the anchor argument. For non-IRT methods, anchor items are used to compute the test score (as matching criterion). For IRT methods, anchor items are used to rescale the item parameters on a common metric. See the corresponding methods for further information. Note that anchor argument is not working with "LRT" method.

The output of the selected method can be stored in a text file by fixing save.output and output appropriately. See the help file of the corresponding method for further information.


The output of the selected DIF detection method.


Sebastien Beland
Collectif pour le Developpement et les Applications en Mesure et Evaluation (Cdame)
Universite du Quebec a Montreal,
David Magis
Department of Psychology, University of Liege
Research Group of Quantitative Psychology and Individual Differences, KU Leuven,
Gilles Raiche
Collectif pour le Developpement et les Applications en Mesure et Evaluation (Cdame)
Universite du Quebec a Montreal,


Agresti, A. (1990). Categorical data analysis. New York: Wiley.

Agresti, A. (1992). A survey of exact inference for contingency tables. Statistical Science, 7, 131-177. doi: 10.1214/ss/1177011454

Aguerri, M.E., Galibert, M.S., Attorresi, H.F. and Maranon, P.P. (2009). Erroneous detection of nonuniform DIF using the Breslow-Day test in a short test. Quality and Quantity, 43, 35-44. doi: 10.1007/s11135-007-9130-2

Angoff, W. H., and Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 2, 95-106. doi: 10.1111/j.1745-3984.1973.tb00787.x

Chalmers, R. P. (2012). mirt: A Multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. doi: 10.18637/jss.v048.i06

Chalmers, R. P. (2018). Improving the Crossing-SIBTEST statistic for detecting non-uniform DIF. Psychometrika, 83(2), 376–386. doi: 10.1007/s11336-017-9583-8

Dorans, N. J. and Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355-368. doi: 10.1111/j.1745-3984.1986.tb00255.x

Holland, P. W. and Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer and H. I. Braun (Dirs.), Test validity. Hillsdale, NJ: Lawrence Erlbaum Associates.

Li, H.-H., and Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647–677. doi: 10.1007/BF02294041

Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.

Magis, D., Beland, S., Tuerlinckx, F. and De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847-862. doi: 10.3758/BRM.42.3.847

Penfield, R.D. (2003). Application of the Breslow-Day test of trend in odds ratio heterogeneity to the detection of nonuniform DIF. Alberta Journal of Educational Research, 49, 231-243.

Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197-207. doi: 10.1177/014662169001400208

Shealy, R. and Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detect test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194. doi: 10.1007/BF02294572

Swaminathan, H. and Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370. doi: 10.1111/j.1745-3984.1990.tb00754.x

Thissen, D., Steinberg, L. and Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer and H. Braun (Eds.), Test validity. Hillsdale, NJ: Lawrence Erlbaum Associates.

See Also

difTID, difMH, difStd, difBD, difLogistic, difSIBTEST, difLord, difRaju, difLRT, dichoDif


## Not run: 

 # Loading of the verbal data

 # Excluding the "Anger" variable
 verbal <- verbal[colnames(verbal)!="Anger"]

 # Calling Mantel-Haenszel 
 selectDif(verbal, group = 25, = 1, method = "MH")

 # Calling Mantel-Haenszel and saving output in 'MH.txt' file
 selectDif(verbal, group = 25, = 1, method = "MH", 
    save.output = TRUE, output = c("MH", "default"))

 # Calling Lord method
 # 2PL model, with item purification
 selectDif(verbal, group = 25, = 1, method = "Lord", model = "2PL", 
           purify = TRUE)
## End(Not run)

difR documentation built on July 2, 2020, 3:34 a.m.