umxThresholdMatrix: Create the threshold matrix needed for modeling ordinal data.

View source: R/build_run_modify.R

umxThresholdMatrixR Documentation

Create the threshold matrix needed for modeling ordinal data.

Description

High-level helper for ordinal modeling. Creates, labels, and sets smart-starts for this complex set set of an algebra and matrices. Big time saver!

Usage

umxThresholdMatrix(
  df,
  fullVarNames = NULL,
  sep = NULL,
  method = c("Mehta", "allFree"),
  threshMatName = "threshMat",
  l_u_bound = c(NA, NA),
  droplevels = FALSE,
  verbose = FALSE,
  selDVs = "deprecated"
)

Arguments

df

The data being modeled (to allow access to the factor levels and quantiles within these for each variable)

fullVarNames

The variable names. Note for twin data, just the base names, which sep will be used to fill out.

sep

(e.g. "_T") Required for wide (twin) data. It is used to break the base names our from their numeric suffixes.

method

How to implement the thresholds: Mehta, (1 free thresh for binary, first two fixed for ordinal) or "allFree"

threshMatName

name of the matrix which is returned. Defaults to "threshMat" - best not to change it.

l_u_bound

c(NA, NA) by default, you can use this to bound the first (base) threshold.

droplevels

Whether to drop levels with no observed data (defaults to FALSE)

verbose

How much to say about what was done. (defaults to FALSE)

selDVs

deprecated. Use "fullVarNames"

Details

We often need to model ordinal data: sex, low-med-hi, depressed/normal, etc., A useful conceptual strategy to handle these data is to build a standard model for normally-varying data and then to threshold this normal distribution to generate the observed data. Thus an observation of "depressed" is modeled as a high score on the latent normally distributed trait, with thresholds set so that only scores above this threshold (1-minus the number of categories) reach the criteria for the diagnosis.

Making this work can require fixing the first 2 thresholds of ordinal data, or fixing both the mean and variance of a latent variable driving binary data, in order to estimate its one-free parameter: where to place the single threshold separating low from high cases.

The function returns a 3-item list consisting of:

  1. A thresholdsAlgebra (named threshMatName)

  2. A matrix of deviations for the thresholds (deviations_for_thresh)

  3. A lower matrix of ones (lowerOnes_for_thresh)

Twin Data

With twin data, make sure to provide the full names for twin data... this is not standard I know...

For twins (the function currently handles only pairs), the thresholds are equated for both twins using labels:

$labels

  obese_T1         obese_T2

dev_1 "obese_dev1" "obese_dev1"

Value

  • list of thresholds matrix, deviations, lowerOnes

References

See Also

OpenMx::mxThreshold()

Other Advanced Model Building Functions: umxAlgebra(), umxFixAll(), umxJiggle(), umxRun(), umxUnexplainedCausalNexus(), umx, xmuLabel(), xmuValues()

Examples


# ============================
# = Simple non-twin examples =
# ============================

# data: 1 2-level ordered factor
x = data.frame(ordered(rbinom(100,1,.5))); names(x) = c("x")

tmp = umxThresholdMatrix(x, fullVarNames = "x")
# The lower ones matrix (all fixed)
tmp[[1]]$values
tmp[[1]]$free

# The deviations matrix
tmp[[2]]$values
tmp[[2]]$labels # note: for twins, labels will be equated across twins

# The algebra that adds the deviations to create thresholds:
tmp[[3]]$formula

# Example of a warning to not omit the variable names
# tmp = umxThresholdMatrix(x)
# Polite message: For coding safety, when calling umxThresholdMatrix, set fullVarNames...

# One ordered factor with 5-levels
x = cut(rnorm(100), breaks = c(-Inf,.2,.5, .7, Inf)); levels(x) = 1:5
x = data.frame(ordered(x)); names(x) <- c("x")
tmp = umxThresholdMatrix(x, fullVarNames = "x")
tmp[[2]]$name
tmp[[2]]$free # last one is free.. (method = Mehta)

tmp = umxThresholdMatrix(x, fullVarNames = "x", l_u_bound= c(-1,1))
tmp[[2]]$lbound # bounds applied to base threshold

# =================================
# = Binary example with twin data =
# =================================
# ===============================================================
# = Create a series of binary and ordinal columns to work with =
# ===============================================================
data(twinData)

# Make "obese" variable with ~20% subjects categorised as obese
obesityLevels   = c('normal', 'obese')
cutPoints       = quantile(twinData[, "bmi1"], probs = .2, na.rm = TRUE)
twinData$obese1 = cut(twinData$bmi1, breaks = c(-Inf, cutPoints, Inf), labels = obesityLevels) 
twinData$obese2 = cut(twinData$bmi2, breaks = c(-Inf, cutPoints, Inf), labels = obesityLevels) 
# Step 2: Make the ordinal variables into umxFactors (ordered, with the levels found in the data)
selVars = c("obese1", "obese2")
twinData[, selVars] = umxFactor(twinData[, selVars])

# Example 1
# use verbose = TRUE to see informative messages
tmp = umxThresholdMatrix(twinData, fullVarNames = selVars, sep = "", verbose = TRUE) 


# ======================================
# = Ordinal (n categories > 2) example =
# ======================================
# Repeat for three-level weight variable
obesityLevels = c('normal', 'overweight', 'obese')
cutPoints = quantile(twinData[, "bmi1"], probs = c(.4, .7), na.rm = TRUE)
twinData$obeseTri1 = cut(twinData$bmi1, breaks = c(-Inf, cutPoints, Inf), labels = obesityLevels) 
twinData$obeseTri2 = cut(twinData$bmi2, breaks = c(-Inf, cutPoints, Inf), labels = obesityLevels) 
selDVs = "obeseTri"; selVars = tvars(selDVs, sep = "", suffixes = 1:2)
twinData[, selVars] = umxFactor(twinData[, selVars])
tmp = umxThresholdMatrix(twinData, fullVarNames = selVars, sep = "", verbose = TRUE)


# ========================================================
# = Mix of all three kinds example (and a 4-level trait) =
# ========================================================
obesityLevels = c('underWeight', 'normal', 'overweight', 'obese')
cutPoints = quantile(twinData[, "bmi1"], probs = c(.25, .4, .7), na.rm = TRUE)
twinData$obeseQuad1 = cut(twinData$bmi1, breaks = c(-Inf, cutPoints, Inf), labels = obesityLevels) 
twinData$obeseQuad2 = cut(twinData$bmi2, breaks = c(-Inf, cutPoints, Inf), labels = obesityLevels) 
selVars = c("obeseQuad1", "obeseQuad2")
twinData[, selVars] = umxFactor(twinData[, selVars])

selDVs =c("bmi", "obese", "obeseTri", "obeseQuad")
tmp = umxThresholdMatrix(twinData, fullVarNames = tvars(selDVs, sep= ""), sep = "", verbose = TRUE)
# The lower ones matrix (all fixed)
tmp[[1]]$values
# The deviations matrix
tmp[[2]]$values
tmp[[2]]$labels # note labels are equated across twins
# Check to be sure twin-1 column labels same as twin-2
tmp[[2]]$labels[,2]==tmp[[2]]$labels[,4]

# The algebra that assembles these into thresholds:
tmp[[3]]$formula
# =================================
# = Example with method = allFree =
# =================================

tmp = umxThresholdMatrix(twinData, fullVarNames = tvars(selDVs, sep= ""), sep = "", 
	method = "allFree")
all(tmp[[2]]$free)


umx documentation built on May 29, 2024, 5:40 a.m.