ccc_rm_reml: Concordance Correlation via REML (Linear Mixed-Effects Model)
In matrixCorr: Collection of Correlation and Association Estimators

ccc_rm_reml

R Documentation

Concordance Correlation via REML (Linear Mixed-Effects Model)

Description

Compute Lin's Concordance Correlation Coefficient (CCC) from a linear mixed-effects model fitted by REML. The fixed-effects part can include method and/or time (optionally their interaction), with a subject-specific random intercept to capture between-subject variation. Large n \times n inversions are avoided by solving small per-subject systems.

Assumption: time levels are treated as regular, equally spaced visits indexed by their order within subject. The AR(1) residual model is in discrete time on the visit index (not calendar time). NA time codes break the serial run. Gaps in the factor levels are ignored (adjacent observed visits are treated as lag-1).

Usage

ccc_rm_reml(
  data,
  response,
  subject,
  method = NULL,
  time = NULL,
  ci = FALSE,
  conf_level = 0.95,
  n_threads = getOption("matrixCorr.threads", 1L),
  ci_mode = c("auto", "raw", "logit"),
  verbose = FALSE,
  digits = 4,
  use_message = TRUE,
  interaction = FALSE,
  max_iter = 100,
  tol = 1e-06,
  Dmat = NULL,
  Dmat_type = c("time-avg", "typical-visit", "weighted-avg", "weighted-sq"),
  Dmat_weights = NULL,
  Dmat_rescale = TRUE,
  ar = c("none", "ar1"),
  ar_rho = NA_real_,
  slope = c("none", "subject", "method", "custom"),
  slope_var = NULL,
  slope_Z = NULL,
  drop_zero_cols = TRUE,
  vc_select = c("auto", "none"),
  vc_alpha = 0.05,
  vc_test_order = c("subj_time", "subj_method"),
  include_subj_method = NULL,
  include_subj_time = NULL,
  sb_zero_tol = 1e-10
)

Arguments

`data`	A data frame.
`response`	Character. Response variable name.
`subject`	Character. Subject ID variable name.
`method`	Character or `NULL`. Optional column name of method factor (added to fixed effects).
`time`	Character or `NULL`. Optional column name of time factor (added to fixed effects).
`ci`	Logical. If `TRUE`, return a CI container; limits are computed by a large-sample delta method for CCC (see CIs note below).
`conf_level`	Numeric in `(0,1)`. Confidence level when `ci = TRUE` (default `0.95`).
`n_threads`	Integer `\geq 1`. Number of OpenMP threads to use for computation. Defaults to `getOption("matrixCorr.threads", 1L)`.
`ci_mode`	Character scalar; one of `c("auto","raw","logit")`. Controls how confidence intervals are computed when `ci = TRUE`. If `"raw"`, a Wald CI is formed on the CCC scale and truncated to `[0,1]`. If `"logit"`, a Wald CI is computed on the `\mathrm{logit}(\mathrm{CCC})` scale and back-transformed to the original scale (often more stable near 0 or 1). If `"auto"` (default), the method is chosen per estimate based on simple diagnostics (e.g., proximity to the `[0,1]` boundary / numerical stability), typically preferring `"logit"` near the boundaries and `"raw"` otherwise.
`verbose`	Logical. If `TRUE`, prints a structured summary of the fitted variance components and `S_B` for each fit. Default `FALSE`.
`digits`	Integer `(\ge 0)`. Number of decimal places to use in the printed summary when `verbose = TRUE`. Default `4`.
`use_message`	Logical. When `verbose = TRUE`, choose the printing mechanism, where `TRUE` uses `message()` (respects `sink()`, easily suppressible via `suppressMessages()`), whereas `FALSE` uses `cat()` to `stdout`. Default `TRUE`.
`interaction`	Logical. Include `method:time` interaction? (default `FALSE`).
`max_iter`	Integer. Maximum iterations for variance-component updates (default `100`).
`tol`	Numeric. Convergence tolerance on parameter change (default `1e-6`).
`Dmat`	Optional `n_t \times n_t` numeric matrix to weight/aggregate time-specific fixed biases in the `S_B` quadratic form. If supplied, it is used (after optional mass rescaling; see `Dmat_rescale`) whenever at least two present time levels exist; otherwise it is ignored. If `Dmat` is `NULL`, a canonical kernel `D_m` is constructed from `Dmat_type` and `Dmat_weights` (see below). `Dmat` should be symmetric positive semidefinite; small asymmetries are symmetrized internally.
`Dmat_type`	Character, one of `c("time-avg","typical-visit", "weighted-avg","weighted-sq")`. Only used when `Dmat = NULL`. It selects the aggregation target for time-specific fixed biases in `S_B`. Options are: `"time-avg"`: square of the time-averaged bias, `D_m=(1/n_t)\,11^\top`. `"typical-visit"`: average of squared per-time biases, `D_m=I_{n_t}`. `"weighted-avg"`: square of a weighted average, `D_m=n_t\,w\,w^\top` with `\sum w=1`. `"weighted-sq"`: weighted average of squared biases, `D_m=n_t\,\mathrm{diag}(w)` with `\sum w=1`. Pick `"time-avg"` for CCC targeting the time-averaged measurement; pick `"typical-visit"` for CCC targeting a randomly sampled visit (typical occasion). Default `"time-avg"`.
`Dmat_weights`	Optional numeric weights `w` used when `Dmat_type %in% c("weighted-avg","weighted-sq")`. Must be nonnegative and finite. If `names(w)` are provided, they should match the full time levels in `data`; they are aligned to the present time subset per fit. If unnamed, the length must equal the number of present time levels. In all cases `w` is internally normalized to sum to 1.
`Dmat_rescale`	Logical. When `TRUE` (default), the supplied/built `D_m` is rescaled to satisfy the simple mass rule `1^\top D_m 1 = n_t`. This keeps the `S_B` denominator invariant and harmonizes with the `\kappa`-shrinkage used for variance terms.
`ar`	Character. Residual correlation structure: `"none"` (iid) or `"ar1"` for subject-level AR(1) correlation within contiguous time runs. Default `c("none")`.
`ar_rho`	Numeric in `(-0.999,\,0.999)` or `NA`. If `ar = "ar1"` and `ar_rho` is finite, it is treated as fixed. If `ar = "ar1"` and `ar_rho = NA`, `\rho` is estimated by profiling a 1-D objective (REML when available; an approximation otherwise). Default `NA_real_`.
`slope`	Character. Optional extra random-effect design `Z`. With `"subject"` a single random slope is added (one column in `Z`); with `"method"` one column per method level is added; with `"custom"` you provide `slope_Z` directly. Default `c("none","subject","method","custom")`.
`slope_var`	For `slope %in% c("subject","method")`, a character string giving the name of a column in `data` used as the slope regressor (e.g., centered time). It is looked up inside `data`; do not pass the vector itself. NAs are treated as zeros in `Z`.
`slope_Z`	For `slope = "custom"`, a numeric matrix with `n` rows (same order as `data`) providing the full extra random-effect design `Z`. Each column of `slope_Z` has its own variance component `\sigma_{Z,j}^2`; columns are treated as uncorrelated (diagonal block in `G`). Ignored otherwise.
`drop_zero_cols`	Logical. When `slope = "method"`, drop all-zero columns of `Z` after subsetting (useful in pairwise fits). Default `TRUE`.
`vc_select`	Character scalar; one of `c("auto","none")`. Controls how the subject by method `\sigma^2_{A\times M}` ("subj_method") and subject by time `\sigma^2_{A\times T}` ("subj_time") variance components are included. If `"auto"` (default), the function performs boundary-aware REML likelihood-ratio tests (LRTs; null on the boundary at zero with a half-`\chi^2_1` reference) to decide whether to retain each component, in the order given by `vc_test_order`. If `"none"`, no testing is done and inclusion is taken from `include_subj_method`/`include_subj_time` (or, if `NULL`, from the mere presence of the corresponding factor in the design). In pairwise fits, the decision is made independently for each method pair.
`vc_alpha`	Numeric scalar in `(0,1)`; default `0.05`. Per-component significance level for the boundary-aware REML LRTs used when `vc_select = "auto"`. The tests are one-sided for variance components on the boundary and are not multiplicity-adjusted.
`vc_test_order`	Character vector (length 2) with a permutation of `c("subj_time","subj_method")`; default `c("subj_time","subj_method")`. Specifies the order in which the two variance components are tested when `vc_select = "auto"`. The component tested first may be dropped before testing the second. If a factor is absent in the design (e.g., no time factor so "subj_time" is undefined), the corresponding test is skipped.
`include_subj_method`, `include_subj_time`	Logical scalars or `NULL`. When `vc_select = "none"`, these control whether the `\sigma^2_{A\times M}` ("subj_method") and `\sigma^2_{A\times T}` ("subj_time") random effects are included (`TRUE`) or excluded (`FALSE`) in the model. If `NULL` (default), inclusion defaults to the presence of the corresponding factor in the data (i.e., at least two method/time levels). When `vc_select = "auto"`, these arguments are ignored (automatic selection is used instead).
`sb_zero_tol`	Non-negative numeric scalar; default `1e-10`. Numerical threshold for the fixed-effect dispersion term `S_B`. After computing `\widehat{S_B}` and its delta-method variance, if `\widehat{S_B} \le` `sb_zero_tol` or non-finite, the procedure treats `S_B` as fixed at zero in the delta step. It sets `d_{S_B}=0` and `\mathrm{Var}(\widehat{S_B})=0`, preventing numerical blow-ups of `SE(CCC)` when `\widehat{S_B}\to 0` and the fixed-effects variance is ill-conditioned for the contrast. This stabilises inference in rare boundary cases; it has no effect when `\widehat{S_B}` is comfortably above the threshold.

Details

For measurement y_{ij} on subject i under fixed levels (method, time), we fit

y = X\beta + Zu + \varepsilon,\qquad u \sim N(0,\,G),\ \varepsilon \sim N(0,\,R).

Notation: m subjects, n=\sum_i n_i total rows; nm method levels; nt time levels; q_Z extra random-slope columns (if any); r=1+nm+nt (or 1+nm+nt+q_Z with slopes). Here Z is the subject-structured random-effects design and G is block-diagonal at the subject level with the following per-subject parameterisation. Specifically,

one random intercept with variance \sigma_A^2;
optionally, method deviations (one column per method level) with a common variance \sigma_{A\times M}^2 and zero covariances across levels (i.e., multiple of an identity);
optionally, time deviations (one column per time level) with a common variance \sigma_{A\times T}^2 and zero covariances across levels;
optionally, an extra random effect aligned with Z (random slope), where each column has its own variance \sigma_{Z,j}^2 and columns are uncorrelated.

The fixed-effects design is ~ 1 + method + time and, if interaction=TRUE, + method:time.

Residual correlation R (regular, equally spaced time). Write R_i=\sigma_E^2\,C_i(\rho). With ar="none", C_i=I. With ar="ar1", within-subject residuals follow a discrete AR(1) process along the visit index after sorting by increasing time level. Ties retain input order, and any NA time code breaks the series so each contiguous block of non-NA times forms a run. The correlation between adjacent observed visits in a run is \rho; we do not use calendar-time gaps. Internally we work with the precision of the AR(1) correlation: for a run of length L\ge 2, the tridiagonal inverse has

(C^{-1})_{11}=(C^{-1})_{LL}=\frac{1}{1-\rho^2},\quad (C^{-1})_{tt}=\frac{1+\rho^2}{1-\rho^2}\ (2\le t\le L-1),\quad (C^{-1})_{t,t+1}=(C^{-1})_{t+1,t}=\frac{-\rho}{1-\rho^2}.

The working inverse is \,R_i^{-1}=\sigma_E^{-2}\,C_i(\rho)^{-1}.

Per-subject Woodbury system. For subject i with n_i rows, define the per-subject random-effects design U_i (columns: intercept, method indicators, time indicators; dimension \,r=1+nm+nt\,). The core never forms V_i = R_i + U_i G U_i^\top explicitly. Instead,

M_i \;=\; G^{-1} \;+\; U_i^\top R_i^{-1} U_i,

and accumulates GLS blocks via rank-r corrections using \,V_i^{-1} = R_i^{-1}-R_i^{-1}U_i M_i^{-1}U_i^\top R_i^{-1}\,:

X^\top V^{-1} X \;=\; \sum_i \Big[ X_i^\top R_i^{-1}X_i \;-\; (X_i^\top R_i^{-1}U_i)\, M_i^{-1}\,(U_i^\top R_i^{-1}X_i) \Big],

X^\top V^{-1} y \;=\; \sum_i \Big[ X_i^\top R_i^{-1}y_i \;-\; (X_i^\top R_i^{-1}U_i)\,M_i^{-1}\, (U_i^\top R_i^{-1}y_i) \Big].

Because G^{-1} is diagonal with positive entries, each M_i is symmetric positive definite; solves/inversions use symmetric-PD routines with a small diagonal ridge and a pseudo-inverse if needed.

Random-slope Z. Besides U_i, the function can include an extra design Z_i.

slope="subject": Z has one column (the regressor in slope_var); Z_{i} is the subject-i block, with its own variance \sigma_{Z,1}^2.
slope="method": Z has one column per method level; row t uses the slope regressor if its method equals level \ell, otherwise 0; all-zero columns can be dropped via drop_zero_cols=TRUE after subsetting. Each column has its own variance \sigma_{Z,\ell}^2.
slope="custom": Z is provided fully via slope_Z. Each column is an independent random effect with its own variance \sigma_{Z,j}^2; cross-covariances among columns are set to 0.

Computations simply augment \tilde U_i=[U_i\ Z_i] and the corresponding inverse-variance block. The EM updates then include, for each column j=1,\ldots,q_Z,

\sigma_{Z,j}^{2\,(new)} \;=\; \frac{1}{m} \sum_{i=1}^m \Big( b_{i,\text{extra},j}^2 + (M_i^{-1})_{\text{extra},jj} \Big) \quad (\text{if } q_Z>0).

Interpretation: the \sigma_{Z,j}^2 represent additional within-subject variability explained by the slope regressor(s) in column j and are not part of the CCC denominator (agreement across methods/time).

EM-style variance-component updates. With current \hat\beta, form residuals r_i = y_i - X_i\hat\beta. The BLUPs and conditional covariances are

b_i \;=\; M_i^{-1}\,(U_i^\top R_i^{-1} r_i), \qquad \mathrm{Var}(b_i\mid y) \;=\; M_i^{-1}.

Let e_i=r_i-U_i b_i. Expected squares then yield closed-form updates:

\sigma_A^{2\,(new)} \;=\; \frac{1}{m}\sum_i \Big( b_{i,0}^2 + (M_i^{-1})_{00} \Big),

\sigma_{A\times M}^{2\,(new)} \;=\; \frac{1}{m\,nm} \sum_i \sum_{\ell=1}^{nm}\!\Big( b_{i,\ell}^2 + (M_i^{-1})_{\ell\ell} \Big) \quad (\text{if } nm>0),

\sigma_{A\times T}^{2\,(new)} \;=\; \frac{1}{m\,nt} \sum_i \sum_{t=1}^{nt} \Big( b_{i,t}^2 + (M_i^{-1})_{tt} \Big) \quad (\text{if } nt>0),

\sigma_E^{2\,(new)} \;=\; \frac{1}{n} \sum_i \Big( e_i^\top C_i(\rho)^{-1} e_i \;+\; \mathrm{tr}\!\big(M_i^{-1}U_i^\top C_i(\rho)^{-1} U_i\big) \Big),

together with the per-column update for \sigma_{Z,j}^2 given above. Iterate until the \ell_1 change across components is <tol or max_iter is reached.

Fixed-effect dispersion S_B: choosing the time-kernel D_m.

Let d = L^\top \hat\beta stack the within-time, pairwise method differences, grouped by time as d=(d_1^\top,\ldots,d_{n_t}^\top)^\top with d_t \in \mathbb{R}^{P} and P = n_m(n_m-1). The symmetric positive semidefinite kernel D_m \succeq 0 selects which functional of the bias profile t \mapsto d_t is targeted by S_B. Internally, the code rescales any supplied/built D_m to satisfy 1^\top D_m 1 = n_t for stability and comparability.

Dmat_type = "time-avg" (square of the time-averaged bias). Let

w \;=\; \frac{1}{n_t}\,\mathbf{1}_{n_t}, \qquad D_m \;\propto\; I_P \otimes (w\,w^\top),

so that

d^\top D_m d \;\propto\; \sum_{p=1}^{P} \left( \frac{1}{n_t}\sum_{t=1}^{n_t} d_{t,p} \right)^{\!2}.

Methods have equal \textit{time-averaged} means within subject, i.e. \sum_{t=1}^{n_t} d_{t,p}/n_t = 0 for all p. Appropriate when decisions depend on an average over time and opposite-signed biases are allowed to cancel.
Dmat_type = "typical-visit" (average of squared per-time biases). With equal visit probability, take

D_m \;\propto\; I_P \otimes \mathrm{diag}\!\Big(\tfrac{1}{n_t},\ldots,\tfrac{1}{n_t}\Big),

yielding

d^\top D_m d \;\propto\; \frac{1}{n_t} \sum_{t=1}^{n_t}\sum_{p=1}^{P} d_{t,p}^{\,2}.

Methods agree on a \textit{typical} occasion drawn uniformly from the visit set. Use when each visit matters on its own; alternating signs d_{t,p} do not cancel.
Dmat_type = "weighted-avg" (square of a weighted time average). For user weights a=(a_1,\ldots,a_{n_t})^\top with a_t \ge 0, set

w \;=\; \frac{a}{\sum_{t=1}^{n_t} a_t}, \qquad D_m \;\propto\; I_P \otimes (w\,w^\top),

so that

d^\top D_m d \;\propto\; \sum_{p=1}^{P} \left( \sum_{t=1}^{n_t} w_t\, d_{t,p} \right)^{\!2}.

Methods have equal \textit{weighted time-averaged} means, i.e. \sum_{t=1}^{n_t} w_t\, d_{t,p} = 0 for all p. Use when some visits (e.g., baseline/harvest) are a priori more influential; opposite-signed biases may cancel according to w.
Dmat_type = "weighted-sq" (weighted average of squared per-time biases). With the same weights w, take

D_m \;\propto\; I_P \otimes \mathrm{diag}(w_1,\ldots,w_{n_t}),

giving

d^\top D_m d \;\propto\; \sum_{t=1}^{n_t} w_t \sum_{p=1}^{P} d_{t,p}^{\,2}.

Methods agree at visits sampled with probabilities \{w_t\}, counting each visit's discrepancy on its own. Use when per-visit agreement is required but some visits should be emphasised more than others.

Time-averaging for CCC (regular visits). The reported CCC targets agreement of the time-averaged measurements per method within subject by default (Dmat_type="time-avg"). Averaging over T non-NA visits shrinks time-varying components by

\kappa_T^{(g)} \;=\; 1/T, \qquad \kappa_T^{(e)} \;=\; \{T + 2\sum_{k=1}^{T-1}(T-k)\rho^k\}/T^2,

with \kappa_T^{(e)}=1/T when residuals are i.i.d. With unbalanced T, the implementation averages the per-(subject,method) \kappa values across the pairs contributing to CCC and then clamps \kappa_T^{(e)} to [10^{-12},\,1] for numerical stability. Choosing Dmat_type="typical-visit" makes S_B match the interpretation of a randomly sampled occasion instead.

Concordance correlation coefficient. The CCC used is

\mathrm{CCC} \;=\; \frac{\sigma_A^2 + \kappa_T^{(g)}\,\sigma_{A\times T}^2} {\sigma_A^2 + \sigma_{A\times M}^2 + \kappa_T^{(g)}\,\sigma_{A\times T}^2 + S_B + \kappa_T^{(e)}\,\sigma_E^2}.

Special cases: with no method factor, S_B=\sigma_{A\times M}^2=0; with a single time level, \sigma_{A\times T}^2=0 (no \kappa-shrinkage). When T=1 or \rho=0, both \kappa-factors equal 1. The extra random-effect variances \{\sigma_{Z,j}^2\} (if used) are not included.

CIs / SEs (delta method for CCC). Let

\theta \;=\; \big(\sigma_A^2,\ \sigma_{A\times M}^2,\ \sigma_{A\times T}^2,\ \sigma_E^2,\ S_B\big)^\top,

and write \mathrm{CCC}(\theta)=N/D with N=\sigma_A^2+\kappa_T^{(g)}\sigma_{A\times T}^2 and D=\sigma_A^2+\sigma_{A\times M}^2+\kappa_T^{(g)}\sigma_{A\times T}^2+S_B+\kappa_T^{(e)}\sigma_E^2. The gradient components are

\frac{\partial\,\mathrm{CCC}}{\partial \sigma_A^2} \;=\; \frac{\sigma_{A\times M}^2 + S_B + \kappa_T^{(e)}\sigma_E^2}{D^2},

\frac{\partial\,\mathrm{CCC}}{\partial \sigma_{A\times M}^2} \;=\; -\,\frac{N}{D^2}, \qquad \frac{\partial\,\mathrm{CCC}}{\partial \sigma_{A\times T}^2} \;=\; \frac{\kappa_T^{(g)}\big(\sigma_{A\times M}^2 + S_B + \kappa_T^{(e)}\sigma_E^2\big)}{D^2},

\frac{\partial\,\mathrm{CCC}}{\partial \sigma_E^2} \;=\; -\,\frac{\kappa_T^{(e)}\,N}{D^2}, \qquad \frac{\partial\,\mathrm{CCC}}{\partial S_B} \;=\; -\,\frac{N}{D^2}.

Estimating \mathrm{Var}(\hat\theta). The EM updates write each variance component as an average of per-subject quantities. For subject i,

t_{A,i} \;=\; b_{i,0}^2 + (M_i^{-1})_{00},\qquad t_{M,i} \;=\; \frac{1}{nm}\sum_{\ell=1}^{nm} \Big(b_{i,\ell}^2 + (M_i^{-1})_{\ell\ell}\Big),

t_{T,i} \;=\; \frac{1}{nt}\sum_{j=1}^{nt} \Big(b_{i,j}^2 + (M_i^{-1})_{jj}\Big),\qquad s_i \;=\; \frac{e_i^\top C_i(\rho)^{-1} e_i + \mathrm{tr}\!\big(M_i^{-1}U_i^\top C_i(\rho)^{-1} U_i\big)}{n_i},

where b_i = M_i^{-1}(U_i^\top R_i^{-1} r_i) and M_i = G^{-1} + U_i^\top R_i^{-1} U_i. With m subjects, form the empirical covariance of the stacked subject vectors and scale by m to approximate the covariance of the means:

\widehat{\mathrm{Cov}}\!\left( \begin{bmatrix} t_{A,\cdot} \\ t_{M,\cdot} \\ t_{T,\cdot} \end{bmatrix} \right) \;\approx\; \frac{1}{m}\, \mathrm{Cov}_i\!\left( \begin{bmatrix} t_{A,i} \\ t_{M,i} \\ t_{T,i} \end{bmatrix}\right).

(Drop rows/columns as needed when nm==0 or nt==0.)

The residual variance estimator is a weighted mean \hat\sigma_E^2=\sum_i w_i s_i with w_i=n_i/n. Its variance is approximated by the variance of a weighted mean of independent terms,

\widehat{\mathrm{Var}}(\hat\sigma_E^2) \;\approx\; \Big(\sum_i w_i^2\Big)\,\widehat{\mathrm{Var}}(s_i),

where \widehat{\mathrm{Var}}(s_i) is the sample variance across subjects. The method-dispersion term uses the quadratic-form delta already computed for S_B:

\widehat{\mathrm{Var}}(S_B) \;=\; \frac{2\,\mathrm{tr}\!\big((A_{\!fix}\,\mathrm{Var}(\hat\beta))^2\big) \;+\; 4\,\hat\beta^\top A_{\!fix}\,\mathrm{Var}(\hat\beta)\, A_{\!fix}\,\hat\beta} {\big[nm\,(nm-1)\,\max(nt,1)\big]^2},

with A_{\!fix}=L\,D_m\,L^\top.

Putting it together. Assemble \widehat{\mathrm{Var}}(\hat\theta) by combining the (\sigma_A^2,\sigma_{A\times M}^2,\sigma_{A\times T}^2) covariance block from the subject-level empirical covariance, add the \widehat{\mathrm{Var}}(\hat\sigma_E^2) and \widehat{\mathrm{Var}}(S_B) terms on the diagonal, and ignore cross-covariances across these blocks (a standard large-sample simplification). Then

\widehat{\mathrm{se}}\{\widehat{\mathrm{CCC}}\} \;=\; \sqrt{\,\nabla \mathrm{CCC}(\hat\theta)^\top\, \widehat{\mathrm{Var}}(\hat\theta)\, \nabla \mathrm{CCC}(\hat\theta)\,}.

A two-sided (1-\alpha) normal CI is

\widehat{\mathrm{CCC}} \;\pm\; z_{1-\alpha/2}\, \widehat{\mathrm{se}}\{\widehat{\mathrm{CCC}}\},

truncated to [0,1] in the output for convenience. When S_B is truncated at 0 or samples are very small/imbalanced, the normal CI can be mildly anti-conservative near the boundary; a logit transform for CCC or a subject-level (cluster) bootstrap can be used for sensitivity analysis.

Choosing \rho for AR(1). When ar="ar1" and ar_rho = NA, \rho is estimated by profiling the REML log-likelihood at (\hat\beta,\hat G,\hat\sigma_E^2). With very few visits per subject, \rho can be weakly identified; consider sensitivity checks over a plausible range.

Notes on stability and performance

All per-subject solves are \,r\times r with r=1+nm+nt+q_Z, so cost scales with the number of subjects and the fixed-effects dimension rather than the total number of observations. Solvers use symmetric-PD paths with a small diagonal ridge and pseudo-inverse, which helps for very small/unbalanced subsets and near-boundary estimates. For AR(1), observations are ordered by time within subject; NA time codes break the run, and gaps between factor levels are treated as regular steps (elapsed time is not used).

Heteroscedastic slopes across Z columns are supported. Each Z column has its own variance component \sigma_{Z,j}^2, but cross-covariances among Z columns are set to zero (diagonal block). Column rescaling changes the implied prior on b_{i,\text{extra}} but does not introduce correlations.

Threading and BLAS guards

The C++ backend uses OpenMP loops while also forcing vendor BLAS libraries to run single-threaded so that overall CPU usage stays predictable. On OpenBLAS and Apple's Accelerate this is handled automatically. On Intel MKL builds the guard is disabled by default, but you can also opt out manually by setting MATRIXCORR_DISABLE_BLAS_GUARD=1 in the environment before loading matrixCorr.

Model overview

Internally, the call is routed to ccc_lmm_reml_pairwise(), which fits one repeated-measures mixed model per pair of methods. Each model includes:

subject random intercepts (always)
optional subject-by-method (⁠sigma^2_{A \times M}⁠) and subject-by-time (⁠sigma^2_{A \times T}⁠) variance components
optional random slopes specified via slope/slope_var/slope_Z
residual structure ar = "none" (iid) or ar = "ar1"

D-matrix options (Dmat_type, Dmat, Dmat_weights) control how time averaging operates when translating variance components into CCC summaries.

Author(s)

Thiago de Paula Oliveira

References

Lin L (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45: 255-268.

Lin L (2000). A note on the concordance correlation coefficient. Biometrics, 56: 324-325.

Carrasco, J. L. et al. (2013). Estimation of the concordance correlation coefficient for repeated measures using SAS and R. Computer Methods and Programs in Biomedicine, 109(3), 293-304. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.cmpb.2012.09.002")}

King et al. (2007). A Class of Repeated Measures Concordance Correlation Coefficients. Journal of Biopharmaceutical Statistics, 17(4). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10543400701329455")}

Examples

# ====================================================================
# 1) Subject x METHOD variance present, no time
#     y_{i,m} = mu + b_m + u_i + w_{i,m} + e_{i,m}
#     with u_i ~ N(0, s_A^2), w_{i,m} ~ N(0, s_{AxM}^2)
# ====================================================================
set.seed(102)
n_subj <- 60
n_time <- 8

id     <- factor(rep(seq_len(n_subj), each = 2 * n_time))
time   <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj))
method <- factor(rep(rep(c("A","B"),    each  = n_time), times = n_subj))

sigA  <- 0.6   # subject
sigAM <- 0.3   # subject x method
sigAT <- 0.5   # subject x time
sigE  <- 0.4   # residual
# Expected estimate S_B = 0.2^2 = 0.04
biasB <- 0.2   # fixed method bias

# random effects
u_i <- rnorm(n_subj, 0, sqrt(sigA))
u   <- u_i[as.integer(id)]

sm      <- interaction(id, method, drop = TRUE)
w_im_lv <- rnorm(nlevels(sm), 0, sqrt(sigAM))
w_im    <- w_im_lv[as.integer(sm)]

st      <- interaction(id, time, drop = TRUE)
g_it_lv <- rnorm(nlevels(st), 0, sqrt(sigAT))
g_it    <- g_it_lv[as.integer(st)]

# residuals & response
e <- rnorm(length(id), 0, sqrt(sigE))
y <- (method == "B") * biasB + u + w_im + g_it + e

dat_both <- data.frame(y, id, method, time)

# Both sigma2_subject_method and sigma2_subject_time are identifiable here
fit_both <- ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time",
                         vc_select = "auto", verbose = TRUE)
summary(fit_both)
plot(fit_both)

# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
  view_corr_shiny(fit_both)
}

# ====================================================================
# 2) Subject x TIME variance present (sag > 0) with two methods
#     y_{i,m,t} = mu + b_m + u_i + g_{i,t} + e_{i,m,t}
#     where g_{i,t} ~ N(0, s_{AxT}^2) shared across methods at time t
# ====================================================================
set.seed(202)
n_subj <- 60; n_time <- 14
id     <- factor(rep(seq_len(n_subj), each = 2 * n_time))
method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj))
time   <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj))

sigA  <- 0.7
sigAT <- 0.5
sigE  <- 0.5
biasB <- 0.25

u   <- rnorm(n_subj, 0, sqrt(sigA))[as.integer(id)]
gIT <- rnorm(n_subj * n_time, 0, sqrt(sigAT))
g   <- gIT[ (as.integer(id) - 1L) * n_time + as.integer(time) ]
y   <- (method == "B") * biasB + u + g + rnorm(length(id), 0, sqrt(sigE))
dat_sag <- data.frame(y, id, method, time)

# sigma_AT should be retained; sigma_AM may be dropped (since w_{i,m}=0)
fit_sag <- ccc_rm_reml(dat_sag, "y", "id", method = "method", time = "time",
                        vc_select = "auto", verbose = TRUE)
summary(fit_sag)
plot(fit_sag)

# ====================================================================
# 3) BOTH components present: sab > 0 and sag > 0 (2 methods x T times)
#     y_{i,m,t} = mu + b_m + u_i + w_{i,m} + g_{i,t} + e_{i,m,t}
# ====================================================================
set.seed(303)
n_subj <- 60; n_time <- 4
id     <- factor(rep(seq_len(n_subj), each = 2 * n_time))
method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj))
time   <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj))

sigA  <- 0.8
sigAM <- 0.3
sigAT <- 0.4
sigE  <- 0.5
biasB <- 0.2

u   <- rnorm(n_subj, 0, sqrt(sigA))[as.integer(id)]
# (subject, method) random deviations: repeat per (i,m) across its times
wIM <- rnorm(n_subj * 2, 0, sqrt(sigAM))
w   <- wIM[ (as.integer(id) - 1L) * 2 + as.integer(method) ]
# (subject, time) random deviations: shared across methods at time t
gIT <- rnorm(n_subj * n_time, 0, sqrt(sigAT))
g   <- gIT[ (as.integer(id) - 1L) * n_time + as.integer(time) ]
y   <- (method == "B") * biasB + u + w + g + rnorm(length(id), 0, sqrt(sigE))
dat_both <- data.frame(y, id, method, time)

fit_both <- ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time",
                         vc_select = "auto", verbose = TRUE, ci = TRUE)
summary(fit_both)
plot(fit_both)

# If you want to force-include both VCs (skip testing):
fit_both_forced <-
 ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time",
              vc_select = "none", include_subj_method  = TRUE,
              include_subj_time  = TRUE, verbose = TRUE)
summary(fit_both_forced)
plot(fit_both_forced)

# ====================================================================
# 4) D_m choices: time-averaged (default) vs typical visit
# ====================================================================
# Time-average
ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time",
             vc_select = "none", include_subj_method  = TRUE,
             include_subj_time  = TRUE, Dmat_type = "time-avg")

# Typical visit
ccc_rm_reml(dat_both, "y", "id", method = "method", time = "time",
             vc_select = "none", include_subj_method  = TRUE,
             include_subj_time  = TRUE, Dmat_type = "typical-visit")

# ====================================================================
# 5) AR(1) residual correlation with fixed rho (larger example)
# ====================================================================

set.seed(10)
n_subj   <- 40
n_time   <- 10
methods  <- c("A", "B", "C", "D")
nm       <- length(methods)
id     <- factor(rep(seq_len(n_subj), each = n_time * nm))
method <- factor(rep(rep(methods, each = n_time), times = n_subj),
                 levels = methods)
time   <- factor(rep(rep(seq_len(n_time), times = nm), times = n_subj))

beta0    <- 0
beta_t   <- 0.2
bias_met <- c(A = 0.00, B = 0.30, C = -0.15, D = 0.05)
sigA     <- 1.0
rho_true <- 0.6
sigE     <- 0.7

t_num <- as.integer(time)
t_c   <- t_num - mean(seq_len(n_time))
mu    <- beta0 + beta_t * t_c + bias_met[as.character(method)]

u_subj <- rnorm(n_subj, 0, sqrt(sigA))
u      <- u_subj[as.integer(id)]

e <- numeric(length(id))
for (s in seq_len(n_subj)) {
  for (m in methods) {
    idx <- which(id == levels(id)[s] & method == m)
    e[idx] <- stats::arima.sim(list(ar = rho_true), n = n_time, sd = sigE)
  }
}
y <- mu + u + e
dat_ar4 <- data.frame(y = y, id = id, method = method, time = time)

ccc_rm_reml(dat_ar4,
             response = "y", subject = "id", method = "method", time = "time",
             ar = "ar1", ar_rho = 0.6, verbose = TRUE)


# ====================================================================
# 6) Random slope variants (subject, method, custom Z)
# ====================================================================

## By SUBJECT
set.seed(2)
n_subj <- 60; n_time <- 4
id  <- factor(rep(seq_len(n_subj), each = 2 * n_time))
tim <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj))
method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj))
subj <- as.integer(id)
slope_i <- rnorm(n_subj, 0, 0.15)
slope_vec <- slope_i[subj]
base <- rnorm(n_subj, 0, 1.0)[subj]
tnum <- as.integer(tim)
y <- base + 0.3*(method=="B") + slope_vec*(tnum - mean(seq_len(n_time))) +
     rnorm(length(id), 0, 0.5)
dat_s <- data.frame(y, id, method, time = tim)
dat_s$t_num <- as.integer(dat_s$time)
dat_s$t_c   <- ave(dat_s$t_num, dat_s$id, FUN = function(v) v - mean(v))
ccc_rm_reml(dat_s, "y", "id", method = "method", time = "time",
             slope = "subject", slope_var = "t_c", verbose = TRUE)

## By METHOD
set.seed(3)
n_subj <- 60; n_time <- 4
id  <- factor(rep(seq_len(n_subj), each = 2 * n_time))
tim <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj))
method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj))
slope_m <- ifelse(method=="B", 0.25, 0.00)
base <- rnorm(n_subj, 0, 1.0)[as.integer(id)]
tnum <- as.integer(tim)
y <- base + 0.3*(method=="B") + slope_m*(tnum - mean(seq_len(n_time))) +
     rnorm(length(id), 0, 0.5)
dat_m <- data.frame(y, id, method, time = tim)
dat_m$t_num <- as.integer(dat_m$time)
dat_m$t_c   <- ave(dat_m$t_num, dat_m$id, FUN = function(v) v - mean(v))
ccc_rm_reml(dat_m, "y", "id", method = "method", time = "time",
             slope = "method", slope_var = "t_c", verbose = TRUE)

## SUBJECT + METHOD random slopes (custom Z)
set.seed(4)
n_subj <- 50; n_time <- 4
id  <- factor(rep(seq_len(n_subj), each = 2 * n_time))
tim <- factor(rep(rep(seq_len(n_time), times = 2), times = n_subj))
method <- factor(rep(rep(c("A","B"), each = n_time), times = n_subj))
subj <- as.integer(id)
slope_subj <- rnorm(n_subj, 0, 0.12)[subj]
slope_B    <- ifelse(method=="B", 0.18, 0.00)
tnum <- as.integer(tim)
base <- rnorm(n_subj, 0, 1.0)[subj]
y <- base + 0.3*(method=="B") +
     (slope_subj + slope_B) * (tnum - mean(seq_len(n_time))) +
     rnorm(length(id), 0, 0.5)
dat_bothRS <- data.frame(y, id, method, time = tim)
dat_bothRS$t_num <- as.integer(dat_bothRS$time)
dat_bothRS$t_c   <- ave(dat_bothRS$t_num, dat_bothRS$id, FUN = function(v) v - mean(v))
MM <- model.matrix(~ 0 + method, data = dat_bothRS)
Z_custom <- cbind(
  subj_slope = dat_bothRS$t_c,
  MM * dat_bothRS$t_c
)
ccc_rm_reml(dat_bothRS, "y", "id", method = "method", time = "time",
             slope = "custom", slope_Z = Z_custom, verbose = TRUE)

matrixCorr documentation built on April 18, 2026, 5:06 p.m.