cdmTools: Useful Tools for Cognitive Diagnosis Modeling

Documented in FCGDINA

#' G-DINA model for forced-choice blocks
#'
#' @description Estimation of the G-DINA model for forced-choice responses according to Nájera et al. (2024).
#' Block polarity (i.e., statement direction), initial values for parameters, and prior distributions can be specified to determine the design of the forced-choice blocks.
#' The \code{GDINA} package (Ma & de la Torre, 2020) is used to estimate the model via expectation maximumation (EM) algorithm if no priors are used.
#' The code provided by Ma and Jiang (2021) is used to estimate the model via Bayes modal (BM) estimation if priors are used.
#' The \emph{forced-choice diagnostic classification model} (FC-DCM; Huang, 2023) can be efficiently estimated using Bayes modal estimation rather than Markov chain Monte Carlo.
#' Only unidimensional statements (i.e., bidimensional blocks) are currently supported.
#'
#' @param dat A \emph{N} individuals x \emph{J} items (\code{matrix} or \code{data.frame}). Missing values need to be coded as \code{NA}. Caution is advised if missing data are present.
#' @param Q A \emph{F} blocks x \emph{K} attributes Q-matrix (\code{matrix} or \code{data.frame}). Each q-vector must measure two attributes, reflecting the attributes measured by its statements.
#' @param polarity A \emph{F} blocks x 2 (\code{matrix} or \code{data.frame}). Each row reflects the direction of the first and second statement, where 1 and -1 corresponds to direct and inverse statements, respectively. Default is \code{NULL}, denoting that all statements are direct.
#' @param polarity.initial A \code{numeric} value that indicates the initial value for the estimation of the probability of endorsement for the latent group whose ideal response is equal to 0. The initial value for the latent group whose ideal response is equal to 1 will be 1 - \code{polarity.initial}. The initial value for latent groups without a clear ideal response is always equal to 0.5. This argument is ignored if \code{catprob.parm != NULL}. Default is \code{1e-4}.
#' @param polarity.prior A \code{list} containing three \code{vectors} of length 2, each containing the alpha and beta hyperparameters of the Beta distribution for the priors of the latent groups with ideal responses equal to 0, 0.5, and 1, respectively. Marginal maximum likelihood via EM algorithm is used, therefore ignoring these priors, if \code{polarity.prior == NULL}, while BM estimation is used if \code{polarity.prior != NULL}. See the examples for a case. Default is \code{NULL}.
#' @param att.dist How is the joint attribute distribution estimated? It can be \code{"saturated"}, \code{"higher.order"}, \code{"fixed"}, \code{"independent"}, and \code{"loglinear"}. Only considered if EM estimation is used. Default is \code{"saturated"}. See the \code{GDINA} package documentation for more information.
#' @param att.prior A \code{vector} of length 2^K to speficy attribute prior distribution for the latent classes. Only considered if EM estimation is used. Default is \code{NULL}. See the \code{GDINA} package documentation for more information.
#' @param verbose How to print calibration information after each EM iteration? Can be 0, 1 or 2, indicating to print no information, information for current iteration, or information for all iterations.
#' @param higher.order A \code{list} specifying the higher-order joint attribute distribution with the following components. Only considered if EM estimation is used. See the \code{GDINA} package documentation for more information.
#' @param catprob.parm A \code{list} of initial values for probabilities of endorsement for each nonzero category. Default is \code{NULL}. See the \code{GDINA} package documentation for more information.
#' @param control A \code{list} of control parameters. Only considered if EM estimation is used. See the \code{GDINA} package documentation for more information.
#'
#' @return \code{FCGDINA} returns an object of class \code{FCGDINA}.
#' \describe{
#' \item{\code{GDINA.obj}}{Estimation output from the \code{GDINA} function of the \code{GDINA.MJ} (Ma & Jiang, 2021) function, depending on whether EM or BM estimation has been used (\code{list}).}
#' \item{\code{technical}}{Information about the estimation method (EM or BM), initial values, and priors (\code{list}).}
#' \item{\code{specifications}}{Function call specifications (\code{list}).}
#' }
#'
#' @author {Pablo Nájera, Universidad Pontificia Comillas}
#'
#' @references
#' Huang, H.-Y. (2023). Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests. \emph{Educational and Psychological Measurement}, \emph{83}(1), 146-180. https://doi.org/10.1177/00131644211069906
#'
#' Ma, W., & de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. \emph{Journal of Statistical Software}, \emph{93}(14). https://doi.org/10.18637/jss.v093.i14
#'
#' Ma, W., & Jiang, Z. (2021). Estimating Cognitive Diagnosis Models in Small Samples: Bayes Modal Estimation and Monotonic Constraints. \emph{Applied Psychological Measurement}, \emph{45}(2), 95-111. https://doi.org/10.1177/0146621620977681
#'
#' Nájera, P., Kreitchmann, R. S., Escudero, S., Abad, F. J., de la Torre, J., & Sorrel, M. A. (2025). A General Diagnostic Modeling Framework for Forced-Choice Assessments. \emph{British Journal of Mathematical and Statistical Psychology}.
#'
#' @export
#'
#' @examples
#' \donttest{
#' library(GDINA)
#' set.seed(123)
#' Q.items <- do.call("rbind", replicate(5, diag(5), simplify = FALSE)) # Q-matrix for the unidimensional statements
#' GS <- cbind(runif(n = nrow(Q.items), min = 0.1, max = 0.3), runif(n = nrow(Q.items), min = 0.1, max = 0.3)) # Guessing and slip parameter for each statement
#' n.blocks <- 30 # Number of forced-choice blocks
#'
#' #----------------------------------------------------------------------------------------
#' # Illustration with simulated data using only direct statements (i.e., homopolar blocks)
#' #----------------------------------------------------------------------------------------
#'
#' polarity <- matrix(1, nrow = n.blocks, ncol = 2) # Block polarity (1 = direct statement; -1 = indirect statement)
#' sim <- simFCGDINA(N = 1000, Q.items, n.blocks = n.blocks, polarity = polarity, model = "GDINA", GDINA.args = list(GS = GS), seed = 123)
#' Q <- sim$Q # Generated Q-matrix of forced-choice blocks
#' dat <- sim$dat # Generated responses
#' att <- sim$att # Generated attribute profiles
#'
#' fit <- FCGDINA(dat = dat, Q = Q, polarity = polarity) # Fit the G-DINA model with EM estimation
#' ClassRate(personparm(fit$GDINA.obj), att) # Classification accuracy
#'
#' #-------------------------------------------------------------------------------------------
#' # Illustration with simulated data using some inverse stataments (i.e., heteropolar blocks)
#' #-------------------------------------------------------------------------------------------
#'
#' polarity <- matrix(1, nrow = n.blocks, ncol = 2)
#' polarity[sample(x = 1:(2*n.blocks), size = 15, replace = FALSE)] <- -1 # Including 15 inverse statements
#' sim <- simFCGDINA(N = 1000, Q.items, n.blocks = n.blocks, polarity = polarity, model = "GDINA", GDINA.args = list(GS = GS), seed = 123)
#' Q <- sim$Q
#' dat <- sim$dat
#' att <- sim$att
#'
#' fit <- FCGDINA(dat = dat, Q = Q, polarity = polarity)
#' ClassRate(personparm(fit$GDINA.obj), att)
#'
#' #------------------------------------------------------------
#' # Illustration of the FC-DCM (Huang, 2023) via BM estimation
#' #------------------------------------------------------------
#'
#' priors <- list("Minimum" = c(1, 1), # Non-informative prior, Beta(1, 1), for latent group with ideal response = 0
#'                "Intermediate" = c(1e8, 1e8), # Extremely informative prior, Beta(1e8, 1e8), for latent groups with ideal response = 0.5
#'                "Maximum" = c(1, 1)) # Non-informative prior, Beta(1, 1), for latent group with ideal response = 1
#' fit <- FCGDINA(dat = dat, Q = Q, polarity = polarity, polarity.prior = priors, verbose = 0)
#' ClassRate(fit$GDINA.obj$EAP, att)
#' }
FCGDINA <- function(dat, Q, polarity = NULL, polarity.initial = 1e-4, polarity.prior = NULL, att.dist = "saturated", att.prior = NULL, verbose = 1, higher.order = list(), catprob.parm = NULL, control = list()){

  #-------------------
  # Arguments control
  #-------------------

  N <- nrow(dat)
  J <- ncol(dat)
  K <- ncol(Q)
  if(is.null(polarity)){warning("polarity is NULL, so all blocks are regarded as homopolar.")}
  if(is.null(polarity)){polarity <- matrix(1, nrow = J, ncol = 2)}
  est <- ifelse(is.null(polarity.prior), "EM", "BM")
  if(!is.null(polarity.prior)){
    if(!is.list(polarity.prior)){stop("polarity.prior must be a list of length 3, containing the Beta hyperparameters for the latent groups with a minimum, intermediate, and maximum probabilities of endorsement, respectively.")}
  }

  #------------------------------------------------------------------------
  # Calculate initial values and item parameter priors, and estimate model
  #------------------------------------------------------------------------

  ep <- est.polarity(polarity, Q, polarity.initial, polarity.prior)
  if(est == "EM"){
    if(is.null(catprob.parm)){
      fit <- GDINA(dat, Q, catprob.parm = ep$init.parm, att.dist = att.dist, att.prior = att.prior, verbose = verbose, higher.order = higher.order, control = control)
    } else {
      fit <- GDINA(dat, Q, catprob.parm = catprob.parm, att.dist = att.dist, att.prior = att.prior, verbose = verbose, higher.order = higher.order, control = control)
    }
  } else if(est == "BM"){
    if(is.null(catprob.parm)){
      fit <- GDINA.MJ(dat, Q, catprob.parm = ep$init.parm, item.prior = ep$item.prior, verbose = verbose, mono.constr = FALSE)
    } else {
      fit <- GDINA.MJ(dat, Q, catprob.parm = catprob.parm, item.prior = ep$item.prior, verbose = verbose, mono.constr = FALSE)
    }
  }

  #------------------------------------------------------------------------
  # Return outcomes
  #------------------------------------------------------------------------

  res <- list(GDINA.obj = fit,
              technical = list(est = est, init.values = ep$init.parm, priors = ep$item.prior),
              specifications = list(dat = dat, Q = Q,
                                    polarity = polarity, polarity.initial = polarity.initial, polarity.prior = polarity.prior,
                                    att.dist = att.dist, att.prior = att.prior, verbose = verbose,
                                    higher.order = higher.order, catprob.parm = catprob.parm, control = control))
  class(res) <- "FCGDINA"
  return(res)

}