| ba_rm | R Documentation |
Repeated-measures Bland-Altman (BA) analysis for method comparison based on a mixed-effects model fitted to subject-time matched paired differences. The fitted model includes a subject-specific random intercept and, optionally, an AR(1) residual correlation structure within subject.
The function accepts either exactly two methods or \ge 3 methods.
With exactly two methods it returns a single fitted BA object. With
\ge 3 methods it fits the same model to every unordered method pair and
returns pairwise matrices of results.
Required variables
response: numeric measurements.
subject: subject identifier.
method: method label with at least two distinct levels.
time: replicate/time key used to form within-subject pairs.
For any analysed pair of methods, only records where both methods are present
for the same subject and integer-coerced time contribute to the
fit. Rows with missing values in any required field are excluded for that
analysed pair.
ba_rm(
data = NULL,
response,
subject,
method,
time,
conf_level = 0.95,
n_threads = getOption("matrixCorr.threads", 1L),
verbose = FALSE,
loa_multiplier = 1.96,
include_slope = FALSE,
use_ar1 = FALSE,
ar1_rho = NA_real_,
max_iter = 200L,
tol = 1e-06
)
## S3 method for class 'ba_repeated'
print(
x,
digits = 3,
ci_digits = 3,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
## S3 method for class 'ba_repeated_matrix'
print(
x,
digits = 3,
ci_digits = 3,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
style = c("pairs", "matrices"),
...
)
## S3 method for class 'ba_repeated'
plot(
x,
title = "Bland-Altman (repeated measurements)",
subtitle = NULL,
point_alpha = 0.7,
point_size = 2.2,
line_size = 0.8,
shade_ci = TRUE,
shade_alpha = 0.08,
smoother = c("none", "loess", "lm"),
symmetrize_y = TRUE,
show_points = TRUE,
show_value = TRUE,
...
)
## S3 method for class 'ba_repeated_matrix'
plot(
x,
pairs = NULL,
against = NULL,
facet_scales = c("free_y", "fixed"),
title = "Bland-Altman (repeated, pairwise)",
point_alpha = 0.6,
point_size = 1.8,
line_size = 0.7,
shade_ci = TRUE,
shade_alpha = 0.08,
smoother = c("none", "loess", "lm"),
show_points = TRUE,
show_value = TRUE,
...
)
data |
Optional data frame-like object. If supplied, |
response |
Numeric response vector, or a single character string naming
the response column in |
subject |
Subject identifier vector (integer, numeric, or factor), or a
single character string naming the subject column in |
method |
Method label vector (character, factor, integer, or numeric), or
a single character string naming the method column in |
time |
Replicate/time index vector (integer or numeric), or a single
character string naming the time column in |
conf_level |
Confidence level for Wald confidence intervals for the
reported bias and both LoA endpoints. Must lie in |
n_threads |
Integer |
verbose |
Logical. If |
loa_multiplier |
Positive scalar giving the SD multiplier used to form
the limits of agreement. Default is |
include_slope |
Logical. If |
use_ar1 |
Logical. If |
ar1_rho |
Optional AR(1) parameter. Must satisfy |
max_iter |
Maximum number of EM/GLS iterations used by the backend. |
tol |
Convergence tolerance for the backend EM/GLS iterations. |
x |
A |
digits |
Number of digits for estimates (default 3). |
ci_digits |
Number of digits for CI bounds (default 3). |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
... |
Additional theme adjustments passed to |
style |
Show as pairs or matrix format? |
title |
Plot title (character scalar). Defaults to
|
subtitle |
Optional subtitle (character scalar). If |
point_alpha |
Numeric in |
point_size |
Positive numeric. Size of scatter points; passed to
|
line_size |
Positive numeric. Line width for horizontal bands
(bias and both LoA) and, when requested, the proportional-bias line.
Passed to |
shade_ci |
Logical. If |
shade_alpha |
Numeric in |
smoother |
One of |
symmetrize_y |
Logical (two-method plot only). If |
show_points |
Logical. If |
show_value |
Logical; included for a consistent plotting interface. Repeated-measures Bland-Altman plots do not overlay numeric cell values, so this argument currently has no effect. |
pairs |
(Faceted pairwise plot only.) Optional character vector of
labels specifying which method contrasts to display. Labels must match the
"row - column" convention used by |
against |
(Faceted pairwise plot only.) Optional single method name.
If supplied, facets are restricted to contrasts of the chosen method
against all others. Ignored when |
facet_scales |
(Faceted pairwise plot only.) Either |
For a selected pair of methods (a, b), the backend first forms complete
within-subject pairs at matched subject and integer-coerced
time. Let
d_{it} = y_{itb} - y_{ita},
\qquad
m_{it} = \frac{y_{ita} + y_{itb}}{2},
where d_{it} is the paired difference and m_{it} is the paired
mean for subject i at time/replicate t. Only complete
subject-time matches contribute to that pairwise fit.
If multiple rows are present for the same subject-time-method
combination within an analysed pair, the backend keeps the last encountered
value for that combination when forming the pair. The function therefore
implicitly assumes at most one observation per subject-time-method
cell for each analysed contrast.
The fitted model for each analysed pair is
d_{it} = \beta_0 + \beta_1 x_{it} + u_i + \varepsilon_{it},
where x_{it} = m_{it} if include_slope = TRUE and the term is omitted
otherwise; u_i \sim \mathcal{N}(0, \sigma_u^2) is a subject-specific
random intercept; and the within-subject residual vector satisfies
\mathrm{Cov}(\varepsilon_i) = \sigma_e^2 R_i.
When use_ar1 = FALSE, R_i = I. When use_ar1 = TRUE, the backend
works with the residual precision matrix C_i = R_i^{-1} over
contiguous time blocks within subject and uses \sigma_e^2 C_i^{-1} as
the residual covariance.
Within each subject, paired observations are ordered by integer-coerced
time. AR(1) correlation is applied only over strictly contiguous runs
satisfying t_{k+1} = t_k + 1. Gaps break the run. Negative times, and
any isolated positions not belonging to a contiguous run, are treated as
independent singletons.
For a contiguous run of length L and correlation parameter \rho,
the block precision matrix is
C
=
\frac{1}{1-\rho^2}
\begin{bmatrix}
1 & -\rho & & & \\
-\rho & 1+\rho^2 & -\rho & & \\
& \ddots & \ddots & \ddots & \\
& & -\rho & 1+\rho^2 & -\rho \\
& & & -\rho & 1
\end{bmatrix},
with a very small ridge added to the diagonal for numerical stability.
If use_ar1 = TRUE and ar1_rho is supplied, that value is used after
validation and clipping to the admissible numerical range handled by the
backend.
If use_ar1 = TRUE and ar1_rho = NA, the backend estimates rho
separately for each analysed pair by:
fitting the corresponding iid model;
computing a moments-based lag-1 estimate from detrended residuals within contiguous blocks, used only as a seed; and
refining that seed by a short profile search over rho using the
profiled REML log-likelihood.
In the exported ba_rm() wrapper, if an AR(1) fit for a given analysed pair
fails specifically because the backend EM/GLS routine did not converge to
admissible finite variance-component estimates, the wrapper retries that pair
with iid residuals. If the iid refit succeeds, the final reported residual
model for that pair is "iid" and a warning is issued. Other AR(1) failures
are not simplified and are propagated as errors.
When include_slope = TRUE, the paired mean regressor is centred and scaled
internally before fitting. Let \bar m be the mean of the observed paired
means. The backend chooses a scaling denominator from:
the sample SD;
the IQR-based scale \mathrm{IQR}(m)/1.349;
the MAD-based scale 1.4826\,\mathrm{MAD}(m).
It uses the first of these that is not judged near-zero relative to the
largest finite positive candidate scale, under a threshold proportional to
\sqrt{\epsilon_{\mathrm{mach}}}. If all candidate scales are treated as
near-zero, the fit stops with an error because the proportional-bias slope is
not estimable on the observed paired-mean scale.
The returned beta_slope is back-transformed to the original paired-mean
scale. The returned BA centre is the fitted mean difference at the centred
reference paired mean \bar m, not the original-scale intercept
coefficient.
The backend uses a stabilised EM/GLS scheme.
Conditional on current variance components, the fixed effects are updated by GLS using the marginal precision of the paired differences after integrating out the random subject intercept. The resulting fixed-effect covariance used in the confidence-interval calculations is the GLS covariance
\mathrm{Var}(\hat\beta \mid \hat\theta)
=
\left( \sum_i X_i^\top V_i^{-1} X_i \right)^{-1}.
Given updated fixed effects, the variance components are refreshed by EM using the conditional moments of the subject random intercept and the residual quadratic forms. Variance updates are ratio-damped and clipped to admissible ranges for numerical stability.
The reported BA centre is always model-based.
When include_slope = FALSE, it is the fitted intercept of the paired-
difference mixed model.
When include_slope = TRUE, it is the fitted mean difference at the centred
reference paired mean used internally by the backend.
The reported limits of agreement are
\mu_0 \pm \texttt{loa\_multiplier}\sqrt{\sigma_u^2 + \sigma_e^2},
where \mu_0 is the reported model-based BA centre. These LoA are for a
single new paired difference from a random subject under the fitted model.
Under the implemented parameterisation, AR(1) correlation affects the
off-diagonal within-subject covariance structure and therefore the estimation
of the model parameters and their uncertainty, but not the marginal variance
of a single paired difference. Consequently rho does not appear explicitly
in the LoA point-estimate formula.
The backend returns Wald confidence intervals for the reported BA centre and for both LoA endpoints.
These intervals combine:
the conditional GLS uncertainty in the fixed effects at the fitted covariance parameters; and
a delta-method propagation of covariance-parameter uncertainty from the observed information matrix of the profiled REML log-likelihood.
The covariance-parameter vector is profiled on transformed scales:
log-variances for \sigma_u^2 and \sigma_e^2, and, when rho is
estimated internally under AR(1), a transformed correlation parameter
mapped back by \rho = 0.95\tanh(z).
Numerical central finite differences are used to approximate both the
observed Hessian of the profiled REML log-likelihood and the gradients of the
reported derived quantities. The resulting variances are combined and the
final intervals are formed with the normal quantile corresponding to
conf_level.
\ge 3 methodsWith exactly two methods, at least two complete subject-time pairs are required; otherwise the function errors.
With \ge 3 methods, the function analyses every unordered pair of method
levels. For a given pair with fewer than two complete subject-time matches,
that contrast is skipped and the corresponding matrix entries remain NA.
For a fitted contrast between methods in matrix positions (j, k) with
j < k, the stored orientation is:
\texttt{bias}[j,k] \approx \mathrm{method}_k - \mathrm{method}_j.
Hence the transposed entry changes sign, while sd_loa and width are
symmetric.
Separate estimation of the residual and subject-level variance components requires sufficient complete within-subject replication after pairing. If the paired data are not adequate to separate these components, the fit stops with an identifiability error.
If the model is conceptually estimable but no finite positive pooled
within-subject variance can be formed during initialisation, the backend uses
0.5 \times v_{\mathrm{ref}} only as a temporary positive starting value
for the EM routine and records a warning string in the backend output. The
exported wrapper does not otherwise modify the final estimates.
If the EM/GLS routine fails to reach admissible finite variance-component estimates, the backend throws an explicit convergence error rather than returning fallback estimates.
Either a "ba_repeated" object (exactly two methods) or a
"ba_repeated_matrix" object (\ge 3 methods).
If exactly two methods are supplied, the returned
"ba_repeated" object is a list with components:
means: numeric vector of paired means
(y_1 + y_2)/2 used for plotting helpers.
diffs: numeric vector of paired differences
y_2 - y_1 used for plotting helpers.
n_obs: integer number of complete subject-time pairs used.
based.on: compatibility alias for n_obs.
mean.diffs: scalar model-based BA centre. When
include_slope = FALSE, this is the fitted intercept of the paired-
difference model. When include_slope = TRUE, this is the fitted mean
difference at the centred reference paired mean used internally by the
backend.
lower.limit, upper.limit: scalar limits of agreement,
computed as
\mu_0 \pm \texttt{loa\_multiplier}\sqrt{\sigma_u^2 + \sigma_e^2}.
lines: named numeric vector with entries lower, mean, and
upper.
CI.lines: named numeric vector containing Wald confidence
interval bounds for the bias and both LoA endpoints:
mean.diff.ci.lower, mean.diff.ci.upper,
lower.limit.ci.lower, lower.limit.ci.upper,
upper.limit.ci.lower, upper.limit.ci.upper.
loa_multiplier: scalar LoA multiplier actually used.
critical.diff: scalar LoA half-width
\texttt{loa\_multiplier} \times \texttt{sd\_loa}.
include_slope: logical, copied from the call.
beta_slope: proportional-bias slope on the original paired-mean
scale when include_slope = TRUE; otherwise NA.
sigma2_subject: estimated variance of the subject-level random
intercept on paired differences.
sigma2_resid: estimated residual variance on paired differences.
use_ar1: logical, copied from the call.
residual_model: either "ar1" or "iid", indicating the final
residual structure actually used.
ar1_rho: AR(1) correlation actually used in the final fit when
residual_model == "ar1"; otherwise NA.
ar1_estimated: logical indicating whether ar1_rho was
estimated internally (TRUE) or supplied by the user (FALSE) when the
final residual model is AR(1); otherwise NA.
The confidence level is stored as attr(x, "conf.level").
If \ge 3 methods are supplied, the returned
"ba_repeated_matrix" object is a list with components:
bias: numeric m \times m matrix of model-based BA centres.
For indices (j, k) with j < k, bias[j, k] estimates
\mathrm{method}_k - \mathrm{method}_j. Thus the matrix orientation is
column minus row, not row minus column. The diagonal is NA.
sd_loa: numeric m \times m matrix of LoA SDs,
\sqrt{\sigma_u^2 + \sigma_e^2}. This matrix is symmetric.
loa_lower, loa_upper: numeric m \times m matrices
of LoA endpoints corresponding to bias. These satisfy
\texttt{loa\_lower}[j,k] = -\texttt{loa\_upper}[k,j] and
\texttt{loa\_upper}[j,k] = -\texttt{loa\_lower}[k,j].
width: numeric m \times m matrix of LoA widths,
loa_upper - loa_lower. This matrix is symmetric.
n: integer m \times m matrix giving the number of complete
subject-time pairs used for each analysed contrast. Pairs with fewer than
two complete matches are left as NA in the estimate matrices.
mean_ci_low, mean_ci_high: numeric m \times m
matrices of Wald confidence interval bounds for bias.
loa_lower_ci_low, loa_lower_ci_high: numeric
m \times m matrices of Wald confidence interval bounds for the lower
LoA.
loa_upper_ci_low, loa_upper_ci_high: numeric
m \times m matrices of Wald confidence interval bounds for the upper
LoA.
slope: optional numeric m \times m matrix of
proportional-bias slopes on the original paired-mean scale when
include_slope = TRUE; otherwise NULL. This matrix is antisymmetric in
sign because each fitted contrast is reversed across the transpose.
methods: character vector of method levels defining matrix row
and column order.
loa_multiplier: scalar LoA multiplier actually used.
conf_level: scalar confidence level used for the reported Wald
intervals.
use_ar1: logical, copied from the call.
ar1_rho: scalar equal to the user-supplied common ar1_rho
when use_ar1 = TRUE and a value was supplied; otherwise NA. This field
does not store the per-pair estimated AR(1) parameters.
residual_model: character m \times m matrix whose entries
are "ar1", "iid", or NA, indicating the final residual structure used
for each pair.
sigma2_subject: numeric m \times m matrix of estimated
subject-level random-intercept variances.
sigma2_resid: numeric m \times m matrix of estimated
residual variances.
ar1_rho_pair: optional numeric m \times m matrix giving
the AR(1) correlation actually used for each pair when the final residual
model is "ar1"; otherwise NA for that entry. Present only when
use_ar1 = TRUE.
ar1_estimated: optional logical m \times m matrix
indicating whether the pair-specific ar1_rho_pair was estimated internally
(TRUE) or supplied by the user (FALSE) for entries whose final residual
model is "ar1"; otherwise NA. Present only when use_ar1 = TRUE.
data_long: canonical long-form data frame with columns
.response, .subject, .method, and .time
retained for pairwise plotting helpers.
mapping: named list mapping response, subject,
method, and time to the canonical stored columns in
data_long.
Thiago de Paula Oliveira
# -------- Simulate repeated-measures data --------
set.seed(1)
# design (no AR)
# subjects
S <- 30L
# replicates per subject
Tm <- 15L
subj <- rep(seq_len(S), each = Tm)
time <- rep(seq_len(Tm), times = S)
# subject signal centered at 0 so BA "bias" won't be driven by the mean level
mu_s <- rnorm(S, mean = 0, sd = 8)
# constant within subject across replicates
true <- mu_s[subj]
# common noise (no AR, i.i.d.)
sd_e <- 2
e0 <- rnorm(length(true), 0, sd_e)
# --- Methods ---
# M1: signal + noise
y1 <- true + e0
# M2: same precision as M1; here identical so M3 can be
# almost perfectly the inverse of both M1 and M2
y2 <- y1 + rnorm(length(true), 0, 0.01)
# M3: perfect inverse of M1 and M2
y3 <- -y1 # = -(true + e0)
# M4: unrelated to all others (pure noise, different scale)
y4 <- rnorm(length(true), 3, 6)
data <- rbind(
data.frame(y = y1, subject = subj, method = "M1", time = time),
data.frame(y = y2, subject = subj, method = "M2", time = time),
data.frame(y = y3, subject = subj, method = "M3", time = time),
data.frame(y = y4, subject = subj, method = "M4", time = time)
)
data$method <- factor(data$method, levels = c("M1","M2","M3","M4"))
# quick sanity checks
with(data, {
Y <- split(y, method)
round(cor(cbind(M1 = Y$M1, M2 = Y$M2, M3 = Y$M3, M4 = Y$M4)), 3)
})
# Run BA (no AR)
ba4 <- ba_rm(
data = data,
response = "y", subject = "subject", method = "method", time = "time",
loa_multiplier = 1.96, conf_level = 0.95,
include_slope = FALSE, use_ar1 = FALSE
)
summary(ba4)
plot(ba4)
# -------- Simulate repeated-measures with AR(1) data --------
set.seed(123)
S <- 40L # subjects
Tm <- 50L # replicates per subject
methods <- c("A","B","C") # N = 3 methods
rho <- 0.4 # AR(1) within-subject across time
ar1_sim <- function(n, rho, sd = 1) {
z <- rnorm(n)
e <- numeric(n)
e[1] <- z[1] * sd
if (n > 1) for (t in 2:n) e[t] <- rho * e[t-1] + sqrt(1 - rho^2) * z[t] * sd
e
}
# Subject baseline + time trend (latent "true" signal)
subj <- rep(seq_len(S), each = Tm)
time <- rep(seq_len(Tm), times = S)
# subject effects
mu_s <- rnorm(S, 50, 7)
trend <- rep(seq_len(Tm) - mean(seq_len(Tm)), times = S) * 0.8
true <- mu_s[subj] + trend
# Method-specific biases (B has +1.5 constant; C has slight proportional bias)
bias <- c(A = 0, B = 1.5, C = -0.5)
# proportional component on "true"
prop <- c(A = 0.00, B = 0.00, C = 0.10)
# Build long data: for each method, add AR(1) noise within subject over time
make_method <- function(meth, sd = 3) {
e <- unlist(lapply(split(seq_along(time), subj),
function(ix) ar1_sim(length(ix), rho, sd)))
y <- true * (1 + prop[meth]) + bias[meth] + e
data.frame(y = y, subject = subj, method = meth, time = time,
check.names = FALSE)
}
data <- do.call(rbind, lapply(methods, make_method))
data$method <- factor(data$method, levels = methods)
# -------- Repeated BA (pairwise matrix) ---------------------
baN <- ba_rm(
response = data$y, subject = data$subject, method = data$method, time = data$time,
loa_multiplier = 1.96, conf_level = 0.95,
include_slope = FALSE, # estimate proportional bias per pair
use_ar1 = TRUE, ar1_rho = rho
)
# Matrices (row - column orientation)
print(baN)
summary(baN)
# Faceted BA scatter by pair
plot(baN, smoother = "lm", facet_scales = "free_y")
# -------- Two-method AR(1) path (A vs B only) ------------------------------
data_AB <- subset(data, method %in% c("A","B"))
baAB <- ba_rm(
response = data_AB$y, subject = data_AB$subject,
method = droplevels(data_AB$method), time = data_AB$time,
include_slope = FALSE, use_ar1 = TRUE, ar1_rho = 0.4
)
print(baAB)
plot(baAB)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.