R/multilevel.cor.R

Defines functions multilevel.cor

Documented in multilevel.cor

#' Within-Group and Between-Group Correlation Matrix
#'
#' This function is a wrapper function for computing the within-group and
#' between-group correlation matrix by calling the \code{sem} function in the
#' R package \pkg{lavaan} and provides standard errors, z test statistics, and
#' significance values (\emph{p}-values) for testing the hypothesis
#' H0: \eqn{\rho} = 0 for all pairs of variables within and between groups.
#'
#' The specification of the within-group and between-group variables is in line
#' with the syntax in Mplus. That is, the \code{within} argument is used to
#' identify the variables in the matrix or data frame specified in \code{x} that
#' are measured on the individual level and modeled only on the within level.
#' They are specified to have no variance in the between part of the model. The
#' \code{between} argument is used to identify the variables in the matrix or
#' data frame specified in \code{x} that are measured on the cluster level and
#' modeled only on the between level. Variables not mentioned in the arguments
#' \code{within} or \code{between} are measured on the individual level and will
#' be modeled on both the within and between level.
#'
#' The function uses maximum likelihood estimation with conventional standard
#' errors (\code{estimator = "ML"}) which are not robust against non-normality
#' and full information maximum likelihood (FIML) method (\code{missing = "fiml"})
#' to deal with missing data by default. FIML method cannot be used when
#' within-group variables have no variance within some clusters. In this cases,
#' the function
#' will switch to listwise deletion. Note that the current lavaan version 0.6-11
#' supports FIML method only for maximum likelihood estimation with conventional
#' standard errors (\code{estimator = "ML"}) in multilevel models. Maximum
#' likelihood estimation with Huber-White robust standard errors
#' (\code{estimator = "MLR"}) uses listwise deletion to deal with missing data.
#' When using FIML method there might be issues in model convergence, which might
#' be resolved by switching to listwise deletion (\code{missing = "listwise"}).
#'
#' The lavaan package uses a quasi-Newton optimization method (\code{"nlminb"})
#' by default. If the optimizer does not converge, model estimation will switch
#' to the Expectation Maximization (EM) algorithm.
#'
#' Statistically significant correlation coefficients can be shown in boldface
#' on the console when specifying \code{sig = TRUE}. However, this option is not
#' supported when using R Markdown, i.e., the argument \code{sig} will switch to
#' \code{FALSE}.
#'
#' Adjustment method for multiple testing when specifying the argument \code{p.adj}
#' is applied to the within-group and between-group correlation matrix separately.
#'
#' @param ...          a matrix or data frame. Alternatively, an expression
#'                     indicating the variable names in \code{data} e.g.,
#'                     \code{multilevel.cor(x1, x2, x3, data = dat)}. Note that
#'                     the operators \code{.}, \code{+}, \code{-}, \code{~},
#'                     \code{:}, \code{::}, and \code{!} can also be used to
#'                     select variables, see 'Details' in the \code{\link{df.subset}}
#'                     function.
#' @param data         a data frame when specifying one or more variables in the
#'                     argument \code{...}. Note that the argument is \code{NULL}
#'                     when specifying a matrix or data frameme for the argument
#'                     \code{...}.
#' @param cluster      either a character string indicating the variable name of
#'                     the cluster variable in \code{...} or \code{data}, or a
#'                     vector representing the nested grouping structure (i.e.,
#'                     group or cluster variable).
#' @param within       a character vector representing variables that are measured
#'                     on the within level and modeled only on the within level.
#'                     Variables not mentioned in \code{within} or \code{between}
#'                     are measured on the within level and will be modeled on both
#'                     the within and between level.
#' @param between      a character vector representing variables that are measured
#'                     on the between level and modeled only on the between level.
#'                     Variables not mentioned in \code{within} or \code{between}
#'                     are measured on the within level and will be modeled on
#'                     both the within and between level.
#' @param estimator    a character string indicating the estimator to be used:
#'                     \code{"ML"} (default) for maximum likelihood with
#'                     conventional standard errors and \code{"MLR"} for maximum
#'                     likelihood with Huber-White robust standard errors. Note
#'                     that by default, full information maximum likelihood (FIML)
#'                     method is used to deal with missing data when using
#'                     \code{"ML"} (\code{missing = "fiml"}), whereas incomplete
#'                     cases are removed listwise (i.e., \code{missing = "listwise"})
#'                     when using \code{"MLR"}.
#' @param optim.method a character string indicating the optimizer, i.e., \code{nlminb}
#'                     (default) for the unconstrained and bounds-constrained
#'                     quasi-Newton method optimizer and \code{"em"} for the
#'                     Expectation Maximization (EM) algorithm.
#' @param missing      a character string indicating how to deal with missing
#'                     data, i.e., \code{"listwise"} for listwise deletion or
#'                     \code{"fiml"} (default) for full information maximum
#'                     likelihood (FIML) method. Note that FIML method is only
#'                     available when \code{estimator = "ML"}. Note that it takes
#'                     longer to estimate the model when using FIML and using FIML
#'                     might cause issues in model convergence, these issues might
#'                     be resolved by switching to listwise deletion.
#' @param sig          logical: if \code{TRUE}, statistically significant
#'                     correlation coefficients are shown in boldface on the
#'                     console.
#' @param alpha        a numeric value between 0 and 1 indicating the significance
#'                     level at which correlation coefficients are printed
#'                     boldface when \code{sig = TRUE}.
#' @param print        a character string or character vector indicating which
#'                     results to show on the console, i.e. \code{"all"} for all
#'                     results, \code{"cor"} for correlation coefficients,
#'                     \code{"se"} for standard errors, \code{"stat"} for z test
#'                     statistics, and \code{"p"} for \emph{p}-values.
#' @param split        logical: if \code{TRUE}, output table is split in
#'                     within-group and between-group correlation matrix.
#' @param order        logical: if \code{TRUE}, variables in the output table are
#'                     ordered, so that variables specified in the argument
#'                     \code{between} are shown first.
#' @param tri          a character string indicating which triangular of the
#'                     matrix to show on the console when \code{split = TRUE},
#'                     i.e., \code{both} for upper and \code{upper} for the upper
#'                     triangular.
#' @param tri.lower    logical: if \code{TRUE} (default) and \code{split = FALSE}
#'                     (default), within-group correlations are shown in the lower
#'                     triangular and between-group correlation are shown in the
#'                     upper triangular.
#' @param p.adj        a character string indicating an adjustment method for
#'                     multiple testing based on \code{\link{p.adjust}}, i.e.,
#'                     \code{none} (default), \code{bonferroni}, \code{holm},
#'                     \code{hochberg}, \code{hommel}, \code{BH}, \code{BY}, or
#'                     \code{fdr}.
#' @param digits       an integer value indicating the number of decimal places
#'                     to be used for displaying correlation coefficients.
#' @param p.digits     an integer value indicating the number of decimal places
#'                     to be used for displaying \emph{p}-values.
#' @param as.na        a numeric vector indicating user-defined missing values,
#'                     i.e. these values are converted to \code{NA} before
#'                     conducting the analysis. Note that \code{as.na()} function
#'                     is only applied to \code{x} but not to \code{cluster}.
#' @param write        a character string naming a file for writing the output into
#'                     either a text file with file extension \code{".txt"} (e.g.,
#'                     \code{"Output.txt"}) or Excel file with file extention
#'                     \code{".xlsx"}  (e.g., \code{"Output.xlsx"}). If the file
#'                     name does not contain any file extension, an Excel file will
#'                     be written.
#' @param append       logical: if \code{TRUE} (default), output will be appended
#'                     to an existing text file with extension \code{.txt} specified
#'                     in \code{write}, if \code{FALSE} existing text file will be
#'                     overwritten.
#' @param check        logical: if \code{TRUE} (default), argument specification,
#'                     convergence and model identification is checked.
#' @param output       logical: if \code{TRUE} (default), output is shown on the
#'                     console.
#'
#' @author
#' Takuya Yanagida \email{takuya.yanagida@@univie.ac.at}
#'
#' @seealso
#' \code{\link{multilevel.descript}}, \code{\link{multilevel.icc}},
#' \code{\link{multilevel.cfa}}, \code{\link{cluster.scores}},
#' \code{\link{write.result}}
#'
#' @references
#' Hox, J., Moerbeek, M., & van de Schoot, R. (2018). \emph{Multilevel analysis:
#' Techniques and applications} (3rd. ed.). Routledge.
#'
#' Snijders, T. A. B., & Bosker, R. J. (2012). \emph{Multilevel analysis: An
#' introduction to basic and advanced multilevel modeling} (2nd ed.). Sage
#' Publishers.
#'
#' @return
#' Returns an object of class \code{misty.object}, which is a list with following
#' entries:
#' \tabular{ll}{
#' \code{call} \tab function call \cr
#' \code{type} \tab type of analysis \cr
#' \code{data} \tab data frame specified in \code{x} including the group variable
#'                  specified in \code{cluster} \cr
#' \code{args} \tab specification of function arguments \cr
#' \code{model} \tab specified model \cr
#' \code{model.fit} \tab fitted lavaan object (\code{model.fit}) \cr
#' \code{check} \tab results of the convergence and model identification check \cr
#' \code{result} \tab list with result tables \cr
#' }
#'
#' @note
#' The function uses the functions \code{sem}, \code{lavInspect},
#' \code{lavMatrixRepresentation}, \code{lavTech}, \code{parameterEstimates},
#' and \code{standardizedsolution} provided in the R package \pkg{lavaan} by
#' Yves Rosseel (2012).
#'
#' @export
#'
#' @examples
#' \dontrun{
#' # Load data set "Demo.twolevel" in the lavaan package
#' data("Demo.twolevel", package = "lavaan")
#'
#' #----------------------------------------------------------------------------
#' # Cluster variable specification
#'
#' # Example 1a: Cluster variable 'cluster' in 'x'
#' multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3", "cluster")], cluster = "cluster")
#'
#' # Example 1b: Cluster variable 'cluster' not in 'x'
#' multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3")], cluster = Demo.twolevel$cluster)
#'
#' # Example 1c: Alternative specification using the 'data' argument
#' multilevel.cor(x1:x3, data = Demo.twolevel, cluster = "cluster")
#'
#' #----------------------------------------------------------------------------
#' # Example 2: All variables modeled on both the within and between level
#' # Highlight statistically significant result at alpha = 0.05
#' multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3")], sig = TRUE,
#'               cluster = Demo.twolevel$cluster)
#'
#' # Example 3: Split output table in within-group and between-group correlation matrix.
#' multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3")],
#'                cluster = Demo.twolevel$cluster, split = TRUE)
#'
#' # Example 4: Print correlation coefficients, standard errors, z test statistics,
#' # and p-values
#' multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3")],
#'                cluster = Demo.twolevel$cluster, print = "all")
#'
#' # Example 5: Print correlation coefficients and p-values
#' # significance values with Bonferroni correction
#' multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3")],
#'                cluster = Demo.twolevel$cluster, print = c("cor", "p"),
#'                p.adj = "bonferroni")
#'
#' #----------------------------------------------------------------------------
#' # Example 6: Variables "y1", "y2", and "y2" modeled on both the within and between level
#' # Variables "w1" and "w2" modeled on the cluster level
#' multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3", "w1", "w2")],
#'                cluster = Demo.twolevel$cluster,
#'                between = c("w1", "w2"))
#'
#' # Example 7: Show variables specified in the argument 'between' first
#' multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3", "w1", "w2")],
#'                cluster = Demo.twolevel$cluster,
#'                between = c("w1", "w2"), order = TRUE)
#'
#' #----------------------------------------------------------------------------
#' # Example 8: Variables "y1", "y2", and "y2" modeled only on the within level
#' # Variables "w1" and "w2" modeled on the cluster level
#' multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3", "w1", "w2")],
#'                cluster = Demo.twolevel$cluster,
#'                within = c("y1", "y2", "y3"), between = c("w1", "w2"))
#'
#' #----------------------------------------------------------------------------
#' # Example 9: lavaan model and summary of the multilevel model used to compute the
#' # within-group and between-group correlation matrix
#'
#' mod <- multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3")],
#'                       cluster = Demo.twolevel$cluster, output = FALSE)
#'
#' # lavaan model syntax
#' mod$model
#'
#' # Fitted lavaan object
#' lavaan::summary(mod$model.fit, standardized = TRUE)
#'
#' #----------------------------------------------------------------------------
#' # Write Results
#'
#' # Example 10a: Write Results into a text file
#' multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3")],
#'                cluster = Demo.twolevel$cluster,
#'                write = "Multilevel_Correlation.txt")
#'
#' # Example 10b: Write Results into a Excel file
#' multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3")],
#'                cluster = Demo.twolevel$cluster,
#'                write = "Multilevel_Correlation.xlsx")
#'
#' result <- multilevel.cor(Demo.twolevel[, c("y1", "y2", "y3")],
#'                          cluster = Demo.twolevel$cluster, output = FALSE)
#' write.result(result, "Multilevel_Correlation.xlsx")
#' }
multilevel.cor <- function(..., data = NULL, cluster, within = NULL, between = NULL,
                           estimator = c("ML", "MLR"), optim.method = c("nlminb", "em"),
                           missing = c("listwise", "fiml"), sig = FALSE, alpha = 0.05,
                           print = c("all", "cor", "se", "stat", "p"), split = FALSE,
                           order = FALSE, tri = c("both", "lower", "upper"), tri.lower = TRUE,
                           p.adj = c("none", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY", "fdr"),
                           digits = 2, p.digits = 3, as.na = NULL, write = NULL,
                           append = TRUE, check = TRUE, output = TRUE) {

  #_____________________________________________________________________________
  #
  # Initial Check --------------------------------------------------------------

  # Check if input '...' is missing
  if (isTRUE(missing(...))) { stop("Please specify the argument '...'.", call. = FALSE) }

  # Check if input '...' is NULL
  if (isTRUE(is.null(substitute(...)))) { stop("Input specified for the argument '...' is NULL.", call. = FALSE) }

  # Check if input 'data' is data frame
  if (isTRUE(!is.null(data) && !is.data.frame(data))) { stop("Please specify a data frame for the argument 'data'.", call. = FALSE) }

  # Check input 'cluster'
  if (isTRUE(missing(cluster))) { stop("Please specify a variable name or vector representing the grouping structure for the argument 'cluster'.", call. = FALSE) }

  # Check if input 'cluster' is NULL
  if (isTRUE(is.null(cluster))) { stop("Input specified for the argument 'cluster' is NULL.", call. = FALSE) }

  #_____________________________________________________________________________
  #
  # Data -----------------------------------------------------------------------

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Data using the argument 'data' ####

  if (isTRUE(!is.null(data))) {

    # Variable names
    var.names <- .var.names(..., data = data, cluster = cluster, check.chr = "a matrix or data frame")

    # Extract data
    x <- data[, var.names]

    # Cluster variable
    cluster <- data[, cluster]

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Data without using the argument 'data' ####

  } else {

    # Extract data
    x <- eval(..., enclos = parent.frame())

    # Data and cluster
    var.group <- .var.group(data = x, cluster = cluster)

    # Data
    if (isTRUE(!is.null(var.group$data)))  { x <- var.group$data }

    # Cluster variable
    if (isTRUE(!is.null(var.group$cluster))) { cluster <- var.group$cluster }

  }

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Within- and Between-Group Variables ####

  # Within variables in 'x'
  within.miss <- !within %in% colnames(x)
  if (isTRUE(any(within.miss))) { stop(paste0(ifelse(length(which(within.miss)) == 1L, "Variable ", "Variables "), "specified in the argument 'within' ", ifelse(length(which(within.miss)) == 1L, "was ", "were "), "not found in 'x': ", within[which(within.miss)], collapse = ", "), call. = FALSE) }  # Within variables

  if (isTRUE(is.null(between))) { var.with <- colnames(x) } else { var.with <- colnames(x)[!colnames(x) %in% between] }

  # At least two within variables
  if (isTRUE(length(var.with) <= 1L)) { stop("Please specify at least two within-group variables.", call. = FALSE) }

  # Between variables in 'x'
  between.miss <- !between %in% colnames(x)
  if (isTRUE(any(between.miss))) { stop(paste0(ifelse(length(which(between.miss)) == 1L, "Variable ", "Variables "), "specified in the argument 'between' ", ifelse(length(which(between.miss)) == 1L, "was ", "were "), "not found in 'x': ", within[which(between.miss)], collapse = ", "), call. = FALSE) }

  # Variance within clusters
  x.check <- vapply(x[, between, drop = FALSE], function(y) any(tapply(y, cluster, var, na.rm = TRUE) != 0L), FUN.VALUE = logical(1L))

  if (isTRUE(any(x.check))) { warning(paste0("Following between-group ", ifelse(length(which(x.check)) == 1L, "variable has ", "variables have "), "variance within clusters: ", paste(names(which(x.check)), collapse = ", ")), call. = FALSE) }

  #  Between variables
  if (isTRUE(is.null(within))) { var.betw <- colnames(x) } else { var.betw <- colnames(x)[!colnames(x) %in% within] }

  # At least two between variables
  if (isTRUE(length(var.betw) <= 1L)) { stop("Please specify at least two between-group variables.", call. = FALSE) }

  # Variables in 'within' or 'between'
  wb.inter <- intersect(within, between)
  if (isTRUE(length(wb.inter) > 0L)) { warning(paste0("Following ", ifelse(length(wb.inter) == 1L, "variable is ", "variables are "), "specified in both arguments 'within' and 'between': ", paste(wb.inter, collapse = ", ")), call. = FALSE) }

  # Unique variables at Within and Between
  var <- unique(c(var.with, var.betw))

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Data frame with Cluster Variable ####

  x <- data.frame(x[, var], .cluster = cluster, stringsAsFactors = FALSE)

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Convert user-missing values into NA ####

  if (isTRUE(!is.null(as.na))) { x[, var] <- .as.na(x[, var], na = as.na) }

  #_____________________________________________________________________________
  #
  # Input Check ----------------------------------------------------------------

  # Check input 'check'
  if (isTRUE(!is.logical(check))) { stop("Please specify TRUE or FALSE for the argument 'check'.", call. = FALSE) }

  if (isTRUE(check)) {

    if (isTRUE(!nzchar(system.file(package = "lavaan")))) { stop("Package \"lavaan\" is needed for this function, please install the package.", call. = FALSE) }

    # Check input 'x': Zero variance?
    x.check <- vapply(x, function(y) length(na.omit(unique(y))) == 1L, FUN.VALUE = logical(1L))

    if (isTRUE(any(x.check))) { warning(paste0("Following ", ifelse(length(which(x.check)) == 1L, "variable ", "variables "), "in the matrix or data frame specified in 'x' ", ifelse(length(which(x.check)) == 1L, "has ", "have "), "zero variance: ", paste(names(which(x.check)), collapse = ", ")), call. = FALSE) }

    # Check input 'within'
    x.check <- vapply(x[, var.with, drop = FALSE], function(y) all(tapply(y, x$.cluster, var, na.rm = TRUE) == 0L), FUN.VALUE = logical(1L))

    if (isTRUE(any(x.check))) { warning(paste0("Following within-group ", ifelse(length(which(x.check)) == 1L, "variable has ", "variables have "), "zero variance within all clusters: ", paste(names(which(x.check)), collapse = ", ")), call. = FALSE) }

    # Check input 'estimator'
    if (isTRUE(any(!estimator %in% c("ML", "MLR")))) { stop("Character string in the argument 'estimator' does not match with \"ML\" or \"MLR\".", call. = FALSE) }

    # Check input 'optim.method'
    if (isTRUE(!all(optim.method  %in% c("nlminb", "em")))) { stop("Character string in the argument 'optim.method' does not match with \"nlminb\" or \"em\".", call. = FALSE) }

    # Check input 'missing'
    if (isTRUE(any(!missing %in% c("listwise", "fiml")))) { stop("Character string in the argument 'estimator' does not match with \"listwise\" or \"fiml\".", call. = FALSE) }

    # Check input 'sig'
    if (isTRUE(!is.logical(sig))) { stop("Please specify TRUE or FALSE for the argument 'sig'.", call. = FALSE) }

    # Check input 'alpha'
    if (isTRUE(alpha >= 1L || alpha <= 0L)) { stop("Please specify a number between 0 and 1 for the argument 'alpha'.", call. = FALSE) }

    # Check input 'print'
    if (isTRUE(any(!print %in% c("all", "cor", "se", "stat", "p")))) { stop("Character string(s) in the argument 'print' does not match with \"all\", \"cor\", \"se\", \"stat\", or \"p\".", call. = FALSE) }

    # Check input 'split'
    if (isTRUE(!is.logical(split))) { stop("Please specify TRUE or FALSE for the argument 'split'.", call. = FALSE) }

    # Check input 'order'
    if (isTRUE(!is.logical(order))) { stop("Please specify TRUE or FALSE for the argument 'order'.", call. = FALSE) }

    # Check input 'tri'
    if (isTRUE(any(!tri %in% c("both", "lower", "upper")))) { stop("Character string in the argument 'tri' does not match with \"both\", \"lower\", or \"upper\".", call. = FALSE) }

    # Check input 'tri.lower'
    if (isTRUE(!is.logical(tri.lower))) { stop("Please specify TRUE or FALSE for the argument 'tri.lower'.", call. = FALSE) }

    # Check input 'p.adj'
    if (isTRUE(any(!p.adj %in% c("none", "holm", "bonferroni", "hochberg", "hommel", "BH", "BY", "fdr")))) { stop("Character string in the argument 'p.adj' does not match with \"none\", \"bonferroni\", \"holm\", \"hochberg\", \"hommel\", \"BH\", \"BY\", or \"fdr\".", call. = FALSE) }

    # Check input 'digits'
    if (isTRUE(digits %% 1L != 0L || digits < 0L)) { stop("Please specify a positive integer number for the argument 'digits'.", call. = FALSE) }

    # Check input 'p.digits'
    if (isTRUE(p.digits %% 1L != 0L || p.digits < 0L)) { stop("Please specify a positive integer number for the argument 'p.digits'.", call. = FALSE) }

    # Check input 'append'
    if (isTRUE(!is.logical(append))) { stop("Please specify TRUE or FALSE for the argument 'append'.", call. = FALSE) }

    # Check input 'output'
    if (isTRUE(!is.logical(output))) { stop("Please specify TRUE or FALSE for the argument 'output'.", call. = FALSE) }

  }

  #_____________________________________________________________________________
  #
  # Arguments ------------------------------------------------------------------

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Estimator ####

  estimator <- ifelse(all(c("ML", "MLR") %in% estimator), "ML", estimator)

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Optimizer ####

  if (isTRUE(all(c("nlminb", "em") %in% optim.method))) { optim.method <- "nlminb" }

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Missing data ####

  #...................
  ### Missing values present ####
  if (isTRUE(any(is.na(x)))) {

    complete <- FALSE

    # ML estimation with FIML
    if (isTRUE(estimator == "ML")) {

        missing <- ifelse(all(c("listwise", "fiml") %in% missing), "fiml", missing)

    # MLR estimation
    } else if (isTRUE(estimator == "MLR")) {

      if (isTRUE(missing == "fiml")) {

        warning("FIML method is not available with estimator = \"MLR\", argument missing switched to \"listwise\".", call. = FALSE)

      }

      # Listwise deletion
      missing <- "listwise"

    }

  #...................
  ### No missing values ####
  } else {

    complete <- TRUE
    missing <- "listwise"

  }

  # Cases with missing on all variables
  if (isTRUE(missing == "fiml")) {

    x.na.prop <- misty::na.prop(x[, -which(colnames(x) %in% c(".cluster", between)), drop = FALSE])
    if (any(x.na.prop == 1L)) {

      warning(paste("Data set contains", sum(x.na.prop == 1L), "cases with missing on all variables that are measured on the within level which were not included in the analysis."), call. = FALSE)

      # Remove cases with missing on all variables
      x <- x[which(x.na.prop < 1L), ]

    }

  }

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Print correlation, sample size or significance values ####

  if (isTRUE(all(c("all", "cor", "se", "stat", "p") %in% print))) { print <- "cor" }

  if (isTRUE(length(print) == 1L && "all" %in% print)) { print <- c("cor", "se", "stat", "p") }

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Print triangular ####

  tri <- ifelse(all(c("both", "lower", "upper") %in% tri), "lower", tri)

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Adjustment method for multiple testing ####

  p.adj <- ifelse(all(c("none", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY", "fdr") %in% p.adj), "none", p.adj)

  #_____________________________________________________________________________
  #
  # Main Function --------------------------------------------------------------

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Model specification ####

  mod <- paste(" level: 1\n  ",
               # Within model
               paste(apply(combn(length(var.with), 2L), 2L, function(y) paste(var.with[y[1L]], var.with[y[2L]], sep = " ~~ " )), collapse = "\n   "),
               "\n level: 2\n  ",
               # Between model
               paste(apply(combn(length(var.betw), 2L), 2L, function(y) paste(var.betw[y[1L]], var.betw[y[2L]], sep = " ~~ " )), collapse = "\n   "))

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Model estimation ####

  model.fit <- tryCatch(suppressWarnings(lavaan::sem(mod, data = x, cluster = ".cluster", estimator = estimator,
                                                     missing = missing, optim.method = optim.method,
                                                     se = ifelse(estimator == "MLR", "robust.huber.white", "standard"),
                                                     check.gradient = FALSE, check.post = FALSE, check.vcov = FALSE)),
                        error = function(y) {

                          if (isTRUE(missing == "fiml")) {

                            stop("There was an estimation problem in lavaan, switching to missing = \"listwise\" might solve the problem.", call. = FALSE)

                          } else if (isTRUE(estimator == "MLR")){

                            stop("There was an estimation problem in lavaan, switching to estimator = \"ML\" might solve the problem.", call. = FALSE)

                          } else {

                            stop("There was an estimation problem in lavaan, correlation matrix could not be computed.", call. = FALSE)

                          }})

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Convergence check ####

  if (isTRUE(!lavaan::lavInspect(model.fit, what = "converged"))) {

    #...................
    ### Quasi-Newton method optimizer ####
    if (isTRUE(optim.method == "nlminb")) {

      message("Quasi-Newton optimizer did not converge, switched to the EM algorithm.")

      # Model estimation with EM algorithm
      model.fit <- suppressWarnings(lavaan::sem(mod, data = x, cluster = ".cluster", estimator = estimator,
                                                missing = missing, optim.method = "em",
                                                se = ifelse(estimator == "MLR", "robust.huber.white", "standard"),
                                                check.gradient = FALSE, check.post = FALSE, check.vcov = FALSE))

      # Model not converged
      if (isTRUE(!lavaan::lavInspect(model.fit, what = "converged"))) {

        if (isTRUE(missing == "listwise")) {

          stop("EM algorithm did not converge.", call. = FALSE)

        } else {

          stop("EM algorithm did not converge. Switching to missing = \"listwise\" might solve the estimation problem.", call. = FALSE)

        }

      }

    #...................
    ### Expectation Maximization (EM) algorithm ####

    } else if(isTRUE(optim.method == "em")) {

      if (isTRUE(missing == "listwise")) {

        stop("EM algorithm did not converge.", call. = FALSE)

      } else {

        stop("EM algorithm did not converge. Switching to missing = \"listwise\" might solve the estimation problem.", call. = FALSE)

      }

    }

  }

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Convergence and model identification checks ####

  if (isTRUE(check)) {

    check.vcov <- check.theta.w <- check.theta.b <- check.cov.lv.w <- check.cov.lv.b <- TRUE

    #...................
    ### Standard error ####

    if (isTRUE(any(is.na(unlist(lavaan::lavInspect(model.fit, what = "se")))))) { stop("Standard errors could not be computed.", call. = FALSE) }

    #...................
    ### Variance-covariance matrix of the estimated parameters ####

    eigvals <- eigen(lavaan::lavInspect(model.fit, what = "vcov"), symmetric = TRUE, only.values = TRUE)$values

    # Model contains equality constraints
    model.fit.par <- lavaan::parameterTable(model.fit)$op == "=="

    if (isTRUE(any(model.fit.par))) { eigvals <- rev(eigvals)[-seq_len(sum(model.fit.par))] }

    if (isTRUE(min(eigvals) < .Machine$double.eps^(3L/4L))) {

      warning("The variance-covariance matrix of the estimated parameters is not positive definite. This may be a symptom that the model is not identified.", call. = FALSE)

      check.vcov <- FALSE

    }

    #...................
    ### Negative variance of observed variables ####

    #### Within Level
    if (isTRUE(any(diag(lavaan::lavInspect(model.fit, what = "theta")$within) < 0L))) {

      warning("Some estimated variances of the observed variables at the Within level are negative.", call. = FALSE)

      check.theta.w <- FALSE

    } else if (isTRUE(any(eigen(lavaan::lavTech(model.fit, what = "theta")$within, symmetric = TRUE, only.values = TRUE)$values < (-1L * .Machine$double.eps^(3/4))))) {

      warning("The model-implied variance-covariance matrix of the residuals of the observed variables is not positive definite indicating an absolute residual correlations greater one.", call. = FALSE)

      check.theta.w <- FALSE

    }

    #### Between Level
    if (isTRUE(any(diag(lavaan::lavInspect(model.fit, what = "theta")$.cluster) < 0L))) {

      warning("Some estimated variances of the observed variables at the Between level are negative.", call. = FALSE)

      check.theta.b <- FALSE

    } else if (isTRUE(any(eigen(lavaan::lavTech(model.fit, what = "theta")$.cluster, symmetric = TRUE, only.values = TRUE)$values < (-1L * .Machine$double.eps^(3/4))))) {

      warning("The model-implied variance-covariance matrix of the residuals of the observed variables at the Between level is not positive definite indicating an absolute residual correlations greater one.", call. = FALSE)

      check.theta.b <- FALSE

    }

    #...................
    ### Negative variance of latent variables ####

    #### Within Level
    if (isTRUE(!is.null(lavaan::lavTech(model.fit, what = "cov.lv")$within))) {

      if (isTRUE(any(diag(lavaan::lavTech(model.fit, what = "cov.lv")$within) < 0L))) {

        warning("Some estimated variances of the latent variables at the Within level are negative.", call. = FALSE)

        check.cov.lv.w <- FALSE

      }

      # Model-implied variance-covariance matrix of the latent variables
    } else if (any(dim(lavaan::lavTech(model.fit, what = "cov.lv")$within) != 0L)) {

      if (isTRUE(any(eigen(lavaan::lavTech(model.fit, what = "cov.lv")$within, symmetric = TRUE, only.values = TRUE)$values < (-1L * .Machine$double.eps^(3/4))))) {

        warning("The model-implied variance-covariance matrix of the latent variables at the Within level is not positive definite indicating an absolute correlation greater one.", call. = FALSE)

        check.cov.lv.w <- FALSE

      }

    }

    #### Between Level
    if (isTRUE(!is.null(lavaan::lavTech(model.fit, what = "cov.lv")$cluster))) {

      if (isTRUE(any(diag(lavaan::lavTech(model.fit, what = "cov.lv")$.cluster) < 0L))) {

        warning("Some estimated variances of the latent variables at the Between level are negative.", call. = FALSE)

        check.cov.lv.b <- FALSE

      }

      # Model-implied variance-covariance matrix of the latent variables
    } else if (any(dim(lavaan::lavTech(model.fit, what = "cov.lv")$.cluster) != 0L)) {

      if (isTRUE(any(eigen(lavaan::lavTech(model.fit, what = "cov.lv")$.cluster, symmetric = TRUE, only.values = TRUE)$values < (-1L * .Machine$double.eps^(3/4))))) {

        warning("The model-implied variance-covariance matrix of the latent variables at the Between level is not positive definite indicating an absolute correlation greater one.", call. = FALSE)

        check.cov.lv.b <- FALSE

      }

    }

  } else {

    check.vcov <- check.theta.w <- check.theta.b <- check.cov.lv.w <- check.cov.lv.b <- NULL

  }

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Extract Results ####

  # Standardized solution
  stand <- lavaan::lavMatrixRepresentation(lavaan::standardizedSolution(model.fit))

  # Visible binding for global variable
  mat <- level <- NULL

  #...................
  ### Within-Group Results ####

  # Theta
  with.stand.theta <- subset(stand[ unlist(subset(lavaan::lavMatrixRepresentation(lavaan::parameterestimates(model.fit)), level == 1L, select = "id")), ], mat == "theta")

  # Parameter estimate, z and significance value matrix
  with.p <- with.stat <- with.se <- with.cor <- matrix(NA, ncol = max(stand[, "col"]), nrow = max(stand[, "row"]))
  for (i in seq_len(nrow(with.stand.theta))) {

    with.cor[with.stand.theta[i, "row"], with.stand.theta[i, "col"]] <- with.stand.theta[i, "est.std"]
    with.se[with.stand.theta[i, "row"], with.stand.theta[i, "col"]] <- with.stand.theta[i, "se"]
    with.stat[with.stand.theta[i, "row"], with.stand.theta[i, "col"]] <- with.stand.theta[i, "z"]
    with.p[with.stand.theta[i, "row"], with.stand.theta[i, "col"]] <- with.stand.theta[i, "pvalue"]

  }

  with.cor[lower.tri(with.cor)] <- t(with.cor)[lower.tri(with.cor)]
  with.se[lower.tri(with.se)] <- t(with.se)[lower.tri(with.se)]
  with.stat[lower.tri(with.stat)] <- t(with.stat)[lower.tri(with.stat)]
  with.p[lower.tri(with.p)] <- t(with.p)[lower.tri(with.p)]

  colnames(with.cor) <- colnames(with.se) <- colnames(with.stat) <- colnames(with.p) <-
  rownames(with.cor) <- rownames(with.se) <- rownames(with.stat) <- rownames(with.p) <- sapply(seq_len(max(stand[, "row"])), function(y) unique(stand[which(y == stand$row), "lhs"]))

  #...................
  ### Between-Group Results ####

  # Standardized solution
  betw.stand.theta <- subset(stand[ unlist(subset(lavaan::lavMatrixRepresentation(lavaan::parameterestimates(model.fit)), level == 2L, select = "id")), ], mat == "theta")

  # Parameter estimate, z and significance value matrix
  betw.p <- betw.stat <- betw.se <- betw.cor <- matrix(NA, ncol = max(stand[, "col"]), nrow = max(stand[, "row"]))
  for (i in seq_len(nrow(betw.stand.theta))) {

    betw.cor[betw.stand.theta[i, "row"], betw.stand.theta[i, "col"]] <- betw.stand.theta[i, "est.std"]
    betw.se[betw.stand.theta[i, "row"], betw.stand.theta[i, "col"]] <- betw.stand.theta[i, "se"]
    betw.stat[betw.stand.theta[i, "row"], betw.stand.theta[i, "col"]] <- betw.stand.theta[i, "z"]
    betw.p[betw.stand.theta[i, "row"], betw.stand.theta[i, "col"]] <- betw.stand.theta[i, "pvalue"]

  }

  betw.cor[lower.tri(betw.cor)] <- t(betw.cor)[lower.tri(betw.cor)]
  betw.se[lower.tri(betw.se)] <- t(betw.se)[lower.tri(betw.se)]
  betw.stat[lower.tri(betw.stat)] <- t(betw.stat)[lower.tri(betw.stat)]
  betw.p[lower.tri(betw.p)] <- t(betw.p)[lower.tri(betw.p)]

  colnames(betw.cor) <- colnames(betw.se) <- colnames(betw.stat) <- colnames(betw.p) <-
  rownames(betw.cor) <- rownames(betw.se) <- rownames(betw.stat) <- rownames(betw.p) <- sapply(seq_len(max(stand[, "row"])), function(y) unique(stand[which(y == stand$row), "lhs"]))

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Order Between Variables ####

  if (isTRUE(order && !is.null(between))) {

    pos.betw <- which(colnames(betw.cor) %in% between)
    pos.with <- which(!colnames(betw.cor) %in% between)

    with.cor <- with.cor[c(pos.betw, pos.with), c(pos.betw, pos.with)]
    with.se <- with.se[c(pos.betw, pos.with), c(pos.betw, pos.with)]
    with.stat <- with.stat[c(pos.betw, pos.with), c(pos.betw, pos.with)]
    with.p <- with.p[c(pos.betw, pos.with), c(pos.betw, pos.with)]

    betw.cor <- betw.cor[c(pos.betw, pos.with), c(pos.betw, pos.with)]
    betw.se <- betw.se[c(pos.betw, pos.with), c(pos.betw, pos.with)]
    betw.stat <- betw.stat[c(pos.betw, pos.with), c(pos.betw, pos.with)]
    betw.p <- betw.p[c(pos.betw, pos.with), c(pos.betw, pos.with)]

  }

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Combine Within-Group and Between-Group Results ####

  #...................
  ### Within-group correlations in the lower triangular ####
  if (isTRUE(tri.lower)) {

    # Within-group results
    wb.cor <- with.cor
    wb.se <- with.se
    wb.stat <- with.stat
    wb.p <- with.p

    # Between-group results
    wb.cor[upper.tri(wb.cor)] <- betw.cor[upper.tri(wb.cor)]
    wb.se[upper.tri(wb.se)] <- betw.se[upper.tri(wb.se)]
    wb.stat[upper.tri(wb.stat)] <- betw.stat[upper.tri(wb.stat)]
    wb.p[upper.tri(wb.p)] <- betw.p[upper.tri(wb.p)]

  #...................
  ### Within-group correlations in the upper triangular ####
  } else {

    # Within-group results
    wb.cor <- betw.cor
    wb.se <- betw.se
    wb.stat <- betw.stat
    wb.p <- betw.p

    # Between-group results
    wb.cor[upper.tri(wb.cor)] <- with.cor[upper.tri(wb.cor)]
    wb.se[upper.tri(wb.se)] <- with.se[upper.tri(wb.se)]
    wb.stat[upper.tri(wb.stat)] <- with.stat[upper.tri(wb.stat)]
    wb.p[upper.tri(wb.p)] <- with.p[upper.tri(wb.p)]

  }

  #...................
  ### Adjust p-values for multiple comparison ####
  if (isTRUE(p.adj != "none")) {

    wb.p[lower.tri(wb.p)] <- p.adjust(wb.p[lower.tri(wb.p)], method = p.adj)
    wb.p[upper.tri(wb.p)] <- p.adjust(wb.p[upper.tri(wb.p)], method = p.adj)

  }

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Split Within-Group and Between-Group Results ####

  # Within-group results
  with.cor <- with.cor[which(apply(with.cor, 1L, function(y) !all(is.na(y)))), which(apply(with.cor, 2L, function(y) !all(is.na(y))))]
  with.se <- with.se[which(apply(with.se, 1L, function(y) !all(is.na(y)))), which(apply(with.se, 2L, function(y) !all(is.na(y))))]
  with.stat <- with.stat[which(apply(with.stat, 1L, function(y) !all(is.na(y)))), which(apply(with.stat, 2L, function(y) !all(is.na(y))))]
  with.p <- with.p[which(apply(with.p, 1L, function(y) !all(is.na(y)))), which(apply(with.p, 2L, function(y) !all(is.na(y))))]

  # Between-group results
  betw.cor <- betw.cor[which(apply(betw.cor, 1L, function(y) !all(is.na(y)))), which(apply(betw.cor, 2L, function(y) !all(is.na(y))))]
  betw.se <- betw.se[which(apply(betw.se, 1L, function(y) !all(is.na(y)))), which(apply(betw.se, 2L, function(y) !all(is.na(y))))]
  betw.stat <- betw.stat[which(apply(betw.stat, 1L, function(y) !all(is.na(y)))), which(apply(betw.stat, 2L, function(y) !all(is.na(y))))]
  betw.p <- betw.p[which(apply(betw.p, 1L, function(y) !all(is.na(y)))), which(apply(betw.p, 2L, function(y) !all(is.na(y))))]

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Adjust p-values for multiple comparison ####

  if (isTRUE(p.adj != "none")) {

    with.p[lower.tri(with.p)] <- p.adjust(with.p[lower.tri(with.p)], method = p.adj)
    with.p[upper.tri(with.p)] <- p.adjust(with.p[upper.tri(with.p)], method = p.adj)

    betw.p[lower.tri(betw.p)] <- p.adjust(betw.p[lower.tri(betw.p)], method = p.adj)
    betw.p[upper.tri(betw.p)] <- p.adjust(betw.p[upper.tri(betw.p)], method = p.adj)

  }

  #_____________________________________________________________________________
  #
  # Return object --------------------------------------------------------------

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## lavaan summary ####

  lavaan.summary <- data.frame(c(paste("lavaan", lavaan::lavInspect(model.fit, what = "version")), "", "Estimator", "Standard Errors", "Missing Data", "", "",
                                 "Number of Observations", "Number of Clusters"),
                               c("", "",
                                 # Estimator
                                 estimator,
                                 # Standard errors
                                 switch(lavaan::lavTech(model.fit, what = "options")$se,
                                        "standard" = "Conventional",
                                        "robust.huber.white" = "Huber-White"),
                                 # Missing data
                                 ifelse(isTRUE(complete), "None",
                                        switch(missing,
                                               "listwise" = "Listwise Deletion",
                                               "fiml" = "FIML")), "", "Used",
                                 # Number of observations
                                 lavaan::lavInspect(model.fit, what = "nobs"),
                                 # Number of clusters
                                 lavaan::lavInspect(model.fit, what = "nclusters")),
                               c(rep("", times = 6L), "Total", lavaan::lavInspect(model.fit, what = "norig"), ""),
                               fix.empty.names = FALSE)

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ## Object ####

  object <- list(call = match.call(),
                 type = "multilevel.cor",
                 data = x,
                 args = list(within = within, between = between,
                             estimator = estimator, missing = missing,
                             sig = sig, alpha = alpha, print = print,
                             split = split, order = order, tri = tri, tri.lower = tri.lower,
                             p.adj = p.adj, digits = digits, p.digits = p.digits,
                             as.na = as.na, write = write, append = append, check = check, output = output),
                 model = mod,
                 model.fit = model.fit,
                 check = list(vcov = check.vcov, theta.w = check.theta.w, theta.b = check.theta.b,
                              cov.lv.w = check.cov.lv.w, cov.lv.b = check.cov.lv.b),
                 result = list(summary = lavaan.summary,
                               wb.cor = wb.cor, wb.se = wb.se,
                               wb.stat = wb.stat, wb.p = wb.p,
                               with.cor = with.cor, with.se = with.se,
                               with.stat = with.stat, with.p = with.p,
                               betw.cor = betw.cor, betw.se = betw.se,
                               betw.stat = betw.stat, betw.p = betw.p))

  class(object) <- "misty.object"

  #_____________________________________________________________________________
  #
  # Write Result ---------------------------------------------------------------

  if (isTRUE(!is.null(write))) {

    #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ## Text file ####

    if (isTRUE(grepl("\\.txt", write))) {

      # Send R output to textfile
      sink(file = write, append = ifelse(isTRUE(file.exists(write)), append, FALSE), type = "output", split = FALSE)

      if (isTRUE(append && file.exists(write))) { write("", file = write, append = TRUE) }

      # Print object
      print(object, check = FALSE)

      # Close file connection
      sink()

    #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ## Excel file ####

    } else {

      misty::write.result(object, file = write)

    }

  }

  #_____________________________________________________________________________
  #
  # Output ---------------------------------------------------------------------

  if (isTRUE(output)) { print(object, check = FALSE) }

  return(invisible(object))

}

Try the misty package in your browser

Any scripts or data that you put into this service are public.

misty documentation built on June 29, 2024, 9:07 a.m.