R/reliabilityUni.R
In Bayesrel: Bayesian Reliability Estimation

Documented in strel

#'
#' Estimate single test reliability coefficients for unidimensional scales
#' @description Reliability estimation of alpha, lambda2, the glb, and omega in
#' a Bayesian an frequentist way. The results include posterior and bootstrapped distributions,
#' point estimates, credible intervals, and confidence intervals.
#'
#' @param data The dataset to be analyzed, observations are rows, items are columns
#' @param estimates A character vector containing the estimands, we recommend using lambda4
#' with only a few items due to the computation time
#' @param cov.mat A covariance matrix can be supplied instead of a dataset,
#' but number of observations needs to be specified
#' @param interval A number specifying the uncertainty interval
#' @param n.iter A number for the iterations of the Gibbs Sampler
#' @param n.burnin A number for the burnin in the Gibbs Sampler
#' @param thin A number for the thinning of the MCMC samples
#' @param n.chains A number for the chains to run for the MCMC sampling
#' @param n.boot A number for the bootstrap samples
#' @param omega.freq.method A character string for the method of frequentist omega, either "cfa"
#' (confirmatory factor analysis), or "pfa" (principal factor analysis),
#' with "pfa" the interval is always bootstrapped
#' @param n.obs A number for the sample observations when a covariance matrix is supplied
#' and the factor model is calculated
#' @param alpha.int.analytic A logical for calculating the alpha confidence interval analytically
#' @param omega.int.analytic A logical for calculating the omega confidence interval analytically,
#' only works with cfa as the omega.freq.method
#' @param freq A logical for calculating the frequentist estimates
#' @param Bayes A logical for calculating the Bayesian estimates
#' @param para.boot A logical for calculating the parametric bootstrap, the default is
#' the non-parametric
#' @param item.dropped A logical for calculating the if-item-dropped statistics
#' @param missing A string specifying the way to handle missing data,
#' 'listwise' is self-explanatory,
#' 'pairwise' in the Bayesian paradigm means sampling the missing values as additional parameters
#' from the joint conditional distribution, in the frequentist paradigm this means using
#' the 'pairwise' covariance matrix, except the full information ML method for omega
#' @param callback Empty function call for external use
#' @param k0 A scalar multiplier for the diagonal of the scaling matrix of the inverse Wishart
#' prior distribution for alpha, lambda2, and the glb
#' @param df0 The degrees of freedom of the inverse Wishart
#' prior distribution for alpha, lambda2, and the glb, the default is NULL, which sets
#' the df as the number of items
#' @param a0 The shape parameter of the inverse gamma prior distribution for the residual
#' variances in the single factor model for omega
#' @param b0 The scale parameter of the inverse gamma prior distribution for the residual
#' variances in the single factor model for omega
#' @param m0 The prior mean of the normal distribution on the factor loadings for omega
#' @param disableMcmcCheck A logical disabling the MCMC settings check for running tests and the likes
#'
#' @details Reported are point estimates (posterior mean), Bayesian credible intervals
#' (highest posterior density) and frequentist confidence intervals
#' (non parametric or parametric bootstrap).
#' The estimates supported are Cronbach alpha, Guttman's lambda2/4/6, the glb,
#' and Mcdonald's omega_u (unidimensional). Beware of lambda4 with many indicators,
#' the computational effort is considerable.
#' The glb method uses adjusted code from the 'Rcsdp' package by
#' 'Hector Corrada Bravo', <https://CRAN.R-project.org/package=Rcsdp>.
#' This process applies a slightly adjusted solving algorithm from the 'CSDP'
#' library by 'Brian Borchers' <https://github.com/coin-or/Csdp/wiki>,
#' <doi:10.1080/10556789908805765>, but is wrapped in 'RcppArmadillo'.
#' Guttman's Lambda-4 method is from 'Benton' (2015)
#' <doi:10.1007/978-3-319-07503-7_19>. The principal factor analysis (pfa) for a
#' version of frequentist omega_u can be found in 'Rencher' (2007) and is described
#' in 'Schlegel' (2017)
#' <https://www.r-bloggers.com/2017/03/iterated-principal-factor-method-of-factor-analysis-with-r/>.
#' Coefficients alpha, lambda2/4, and the glb are estimated from the data covariance matrix.
#' Coefficient omega is estimated from the centered data matrix.
#' The analytic confidence interval of alpha is from
#' 'Bonett' and 'Wright' (2015) <doi:10.1002/job.1960>.
#'
#' The prior distribution on Cronbach’s alpha (as well as lambda2 and the glb)
#' is induced by the prior distribution on the covariance matrix,
#' which is an inverse Wishart distribution with the identity matrix (multiplied by a scalar)
#' as a scaling matrix and the number of items k as the degrees of freedom.
#' The prior distribution on McDonald’s omega is induced by the prior distributions
#' on the single-factor model parameters, which are: a normal distribution centered on zero
#' for the factor loadings and scores; an inverse gamma distribution
#' with shape=2 and scale=1 for the residuals; and for the variance of the latent variables an
#' inverse Wishart distribution with the number of items k as a scaling matrix (scalar, since it
#' is of dimension one) and the sum k+2 as the degrees of freedom.
#'
#' @return The basic output displays the interval bounds of the coefficients,
#' highest posterior density intervals for the Bayesian coefficients,
#' and confidence intervals for the frequentist coefficients.
#' The summary output shows the point estimates of the coefficients together
#' with the interval bounds. The point estimates for the Bayesian coefficients are
#' posterior means.
#'
#' @examples
#' # note that these are very few iterations just for the example execution,
#' # you should use the defaults at least
#' summary(strel(asrm, estimates = "lambda2", n.chains = 2, n.iter = 200, n.boot = 200))
#' summary(strel(asrm, estimates = "lambda2", item.dropped = TRUE, n.chains = 2,
#' n.iter = 200, freq = FALSE))
#'
#'
#' @references{
#'
#' \insertRef{Bonett2015}{Bayesrel}
#'
#' \insertRef{Murphy2007}{Bayesrel}
#'
#' \insertRef{Lee2007}{Bayesrel}
#'
#' \insertRef{Pfadt2021Basic}{Bayesrel}
#'
#' \insertRef{Rencher2002}{Bayesrel}
#'
#' }
#' @importFrom grDevices adjustcolor recordPlot
#' @importFrom graphics arrows axis legend lines par plot text title
#' @importFrom methods is
#' @importFrom stats cov cov2cor density ecdf qnorm quantile rchisq rgamma rnorm runif sd var
#' approxfun integrate
#' @importFrom Rdpack reprompt
#'
#' @useDynLib Bayesrel, .registration=TRUE
#' @importFrom Rcpp evalCpp
#'
#' @export
strel <- function(data = NULL,
                  estimates = c("alpha", "lambda2", "glb", "omega"),
                  interval = .95,
                  n.iter = 1000, n.burnin = 50, thin = 1, n.chains = 3,
                  n.boot = 1000,
                  cov.mat = NULL,
                  n.obs = NULL,
                  freq = TRUE, Bayes = TRUE,
                  para.boot = FALSE,
                  item.dropped = FALSE,
                  missing = "pairwise",
                  omega.freq.method = "cfa",
                  omega.int.analytic = TRUE,
                  alpha.int.analytic = TRUE,
                  callback = function(){},
                  k0 = 1e-10,
                  df0 = NULL,
                  a0 = 2,
                  b0 = 1,
                  m0 = 0,
                  disableMcmcCheck = FALSE) {

  default <- c("alpha", "lambda2", "lambda4", "lambda6", "glb", "omega")
  mat <- match(default, estimates)
  estimates <- estimates[mat]
  estimates <- estimates[!is.na(estimates)]
  sum_res <- list()
  sum_res$call <- match.call()

  pairwise <- FALSE
  use_cases <- "everything"
  if (any(is.na(data))) {
    if (missing == "listwise") {
      pos <- which(is.na(data), arr.ind = TRUE)[, 1]
      data <- data[-pos, ]
      ncomp <- nrow(data)
      sum_res$complete <- ncomp
      use_cases <- "complete.obs"
    } else if (missing == "pairwise") {
      pairwise <- TRUE
      sum_res$miss_pairwise <- TRUE
      use_cases <- "pairwise.complete.obs"
    } else return(warning("missing values in data detected, please remove and run again"))
  }


  if (!is.null(cov.mat)) {
    if (is.null(n.obs))
      return(warning("number of observations (n.obs) needs to be specified when entering a covariance matrix"))
    if (sum(cov.mat[lower.tri(cov.mat)] != t(cov.mat)[lower.tri(cov.mat)]) > 0)
      return(warning("input matrix is not symmetric"))
    if (!("matrix" %in% class(try(solve(cov.mat), silent = TRUE))))
      return(warning("Data covariance matrix is not invertible"))
    data <- MASS::mvrnorm(n.obs, rep(0, ncol(cov.mat)), cov.mat, empirical = TRUE)
    colnames(data) <- colnames(cov.mat)
  }

  if (!("matrix" %in% class(try(solve(cov(data, use = use_cases)), silent = TRUE))))
    return(warning("Data covariance matrix is not invertible"))

  data <- scale(data, scale = FALSE) # needed for Bayes omega

  if (Bayes) {
    if (!disableMcmcCheck) {
      checkMcmcSettings(n.iter, n.burnin, n.chains, thin)
    }

    if (is.null(df0)) df0 <- ncol(data)
    sum_res$Bayes <- gibbsFun(data, estimates, n.iter, n.burnin, thin, n.chains, interval, item.dropped, pairwise,
                              callback, k0, df0, a0, b0, m0)
    sum_res$n.iter <- n.iter
    sum_res$n.burnin <- n.burnin
    sum_res$thin <- thin
    sum_res$n.chains <- n.chains
    sum_res$priors$k0 <- k0
    sum_res$priors$df0 <- df0
    sum_res$priors$a0 <- a0
    sum_res$priors$b0 <- b0
    sum_res$priors$m0 <- m0

  }


  if(freq){

    if (para.boot) {
      sum_res$freq <- freqFunPara(data, n.boot, estimates, interval, omega.freq.method, item.dropped,
                                  alpha.int.analytic, omega.int.analytic, pairwise, callback)
    } else {
      sum_res$freq <- freqFunNonPara(data, n.boot, estimates, interval, omega.freq.method, item.dropped,
                                     alpha.int.analytic, omega.int.analytic, pairwise, callback)
    }

    if (alpha.int.analytic)
      sum_res$alpha.interval <- "analytic"
    if (omega.int.analytic)
      sum_res$omega.interval <- "analytic"

    sum_res$n.boot <- n.boot
    sum_res$para.boot <- para.boot
  }


  sum_res$estimates <- estimates
  sum_res$interval <- interval
  sum_res$data <- data

  class(sum_res) <- "strel"
  return(sum_res)
}