fit_cat: Fit categorical antedependence model by maximum likelihood
In antedep: Antedependence Models for Longitudinal Data

fit_cat

R Documentation

Fit categorical antedependence model by maximum likelihood

Description

Computes maximum likelihood estimates for the parameters of an AD(p) model for categorical longitudinal data. The model is parameterized by transition probabilities, and MLEs are obtained in closed form.

Usage

fit_cat(
  y,
  order = 1,
  blocks = NULL,
  homogeneous = TRUE,
  n_categories = NULL,
  na_action = c("fail", "complete", "marginalize", "em"),
  em_max_iter = 100,
  em_tol = 1e-06,
  em_epsilon = 1e-08,
  em_safeguard = TRUE,
  em_verbose = FALSE
)

Arguments

`y`	Integer matrix with n_subjects rows and n_time columns. Each entry should be a category code from 1 to c, where c is the number of categories.
`order`	Antedependence order p. Must be 0, 1, or 2. Default is 1.
`blocks`	Optional integer vector of length n_subjects specifying group membership. If NULL, all subjects are treated as one group.
`homogeneous`	Logical. If TRUE (default), parameters are shared across all groups (blocks are ignored for estimation). If FALSE, separate transition probabilities are estimated for each group.
`n_categories`	Number of categories. If NULL (default), inferred from the maximum value in y.
`na_action`	Handling of missing values in `y`. One of `"fail"` (default, error if any missing), `"complete"` (drop subjects with any missing values), or `"marginalize"` (maximize observed-data likelihood by integrating over missing outcomes), or `"em"` (use `em_cat` for orders 0 and 1).
`em_max_iter`	Maximum EM iterations used when `na_action = "em"`.
`em_tol`	EM convergence tolerance used when `na_action = "em"`.
`em_epsilon`	Numerical smoothing constant used when `na_action = "em"`.
`em_safeguard`	Logical; if `TRUE`, use step-halving safeguard in `em_cat` when `na_action = "em"`.
`em_verbose`	Logical; print EM progress when `na_action = "em"`.

Details

For AD(p), the model decomposes as:

P(Y_1, \ldots, Y_n) = P(Y_1, \ldots, Y_p) \times \prod_{k=p+1}^{n} P(Y_k | Y_{k-p}, \ldots, Y_{k-1})

MLEs are computed as empirical proportions:

Marginal/joint probabilities: count / N
Transition probabilities: conditional count / marginal count

Empty cells receive probability 0 (if denominator is also 0).

When na_action = "em", fit_cat() dispatches to em_cat. In that case, em_safeguard controls step-halving protection against likelihood-decreasing updates, and returned log_l/AIC/BIC/cell_counts are synchronized via a final E-step under the returned parameters. For order = 2, na_action = "em" is not available and errors explicitly; use na_action = "marginalize".

Value

A list of class "cat_fit" containing:

`marginal`	List of marginal/joint probabilities for initial time points
`transition`	List of transition probability arrays for k = p+1 to n
`log_l`	Log-likelihood at MLE
`aic`	Akaike Information Criterion
`bic`	Bayesian Information Criterion
`n_params`	Number of free parameters
`cell_counts`	List of cell counts: observed counts for closed-form fits (`na_action = "fail"/"complete"`), expected counts from the final E-step for EM fits (`na_action = "em"`), and `NULL` for `na_action = "marginalize"`
`convergence`	Optimizer convergence code (0 for closed-form solutions)
`settings`	List of model settings

References

Xie, Y. and Zimmerman, D. L. (2013). Antedependence models for nonstationary categorical longitudinal data with ignorable missingness: likelihood-based inference. Statistics in Medicine, 32, 3274-3289.

Examples


# Simulate binary AD(1) data
set.seed(123)
y <- simulate_cat(n_subjects = 100, n_time = 5, order = 1, n_categories = 2)

# Fit model
fit <- fit_cat(y, order = 1)
print(fit)

# Compare orders
fit0 <- fit_cat(y, order = 0)
fit1 <- fit_cat(y, order = 1)
fit2 <- fit_cat(y, order = 2)
c(AIC_0 = fit0$aic, AIC_1 = fit1$aic, AIC_2 = fit2$aic)

# EM fit with missing data
y_miss <- y
y_miss[sample(length(y_miss), size = round(0.15 * length(y_miss)))] <- NA
fit_em <- fit_cat(
  y_miss,
  order = 1,
  na_action = "em",
  em_max_iter = 80,
  em_tol = 1e-6
)
fit_em$settings$n_iter
fit_em$settings$cell_counts_type

antedep documentation built on April 25, 2026, 1:06 a.m.