BMC: A case study of smoothing Bone Mineral Content longitudinal...

BMCR Documentation

A case study of smoothing Bone Mineral Content longitudinal data

Description

The synthetic dataset contains BMC measured at growing ages for white ethnic subjects, with 932 data from 112 males and 1113 data from 127 females. An additive mixed model based on general P-splines is ideal for analyzing this dataset, able to produce smooth estimations of population and subject BMC trajectories.

Usage

synBMC(gender)

FitBMC(gender, p, gps = TRUE, subset = NULL)

cvBMC(gender, gps = TRUE, n.sim = 100, nc = 1)

Arguments

gender

the gender group ("male" or "female") to study.

p

number of B-splines to represent f(x) and f_j(x) (see Details).

gps

TRUE to construct f(x) and f_j(x) as general P-splines; FALSE to construct them as standard P-splines.

subset

an optional vector of integers that specifies a data subset to be used for model fitting (for example, it is used when cvBMC calls FitBMC on a training set during cross-validation).

n.sim

number of simulations in cross-validation.

nc

number of CPU cores to use for parallel simulations.

Details

Data Description

For either gender group, the synthetic BMC dataset is a dataframe with four variables:

  • id, subject's ID ("M001" to "M112" for males and "F001" to "F127" for females);

  • age, subject's age when a measurement was taken;

  • bmc, BMC measurements (in grams);

  • log.bmc, log-transformed BMC measurements.

It is a synthetic dataset that highly resembles the real one from The Saskatchewan Pediatric Bone Mineral Accrual Study (PBMAS).

Additive Mixed Model

Denote age, BMC and log(BMC) by x, y and z, respectively. The model for either gender group is:

z_{j,x} = f(x) + f_j(x) + e_{j,x},

where

  • f(x) (a fixed-effect spline) is the population log(BMC) trajectory;

  • f_j(x) (a random-effect spline) is subject j's log(BMC) deviation from f(x);

  • e_{j, x} is an i.i.d. Gaussian measurement error.

FitBMC sets up f(x) and f_j(x) as standard or general P-splines then fits this model with gamm.

How Synthetic Data are Simulated

Let n be the number of subjects in either gender group. Let vectors \bm{x_j}, \bm{y_j} and \bm{z_j} be the data for subject j. Let \delta = \min\{\Delta\bm{x_1}, \Delta\bm{x_2}, \ldots, \Delta\bm{x_n}\} be the minimum step size in age values. The synthetic data are simulated as follows.

  1. Fit the additive mixed model above to the real data (\bm{z_j}, \bm{x_j}, j)_{1}^{n};

  2. Randomize the association between \bm{x_j} and j to obtain (\bm{x_j}, r_j)_{1}^{n}, where r_1, r_2, \ldots, r_{n} is a random shuffle of 1, 2, \ldots, n;

  3. Add a small amount of noise ("jitter") to obtain \bm{\tilde{x}_j} = \bm{x_j} + \bm{\delta_j}, where \bm{\delta_j} is vector of samples from uniform distribution on [-\delta/5, -\delta/10] \cup [\delta/10, \delta/5];

  4. Predict \bm{\hat{z}_j} on (\bm{\tilde{x}_j}, r_j)_{1}^{n} using the fitted model and draw \bm{\tilde{z}_j} \sim \textrm{N}(\bm{\hat{z}_j}, \bm{I}\hat{\sigma}_e^2), where \hat{\sigma}_e^2 is the estimated variance of measurement error and \bm{I} is an identity matrix;

  5. Back transform \bm{\tilde{y}_j} = \exp(\bm{\tilde{z}_j}) and compose the synthetic dataset as (\bm{\tilde{y}_j}, \bm{\tilde{x}_j}, r_j)_{1}^{n}.

The synthetic dataset preserves two aspects of the real dataset.

  • Distribution of age values is almost unchanged;

  • The number of data per subject is unchanged.

The fitted model has an R-squared as high as 99.97% for males and 99.92% for females. Thus, statistical modeling on two datasets does not end up with much difference.

Value

synBMC returns a data frame of four variables.

FitBMC returns a fitted "gamm" model, which is a large list with an "lme" component and a (much more useful) "gam" component.

cvBMC returns a matrix of 3 columns. Column 1 gives the mean cross-validation error (averaged over n.sim simulations) for p = 5, 6, \ldots, 20. Columns 2 and 3 give two-standard-error lower and upper bounds for this mean.

Note

We do not hold authentic BMC data. The synthetic version was generated by Ahmed Elhakeem (using our idea and code, though), who holds a copy of the authentic version for his research (see References) collaborated with PBMAS's program director, Professor Adam DG Baxter-Jones. The synthetic dataset is also shared by Elhakeem at https://raw.githubusercontent.com/aelhak/nltmr/main/synth_cohort.csv. Researches interested in the authentic BMC data should contact Professor Baxter-Jones (baxter.jones@usask.ca) in the first place.

Author(s)

Zheyuan Li zheyuan.li@bath.edu

References

Bailey, D. A. (1997). The Saskatchewan Pediatric Bone Mineral Accrual Study: Bone mineral acquisition during the growing years. International Journal of Sports Medicine, 18:s191-s194.

Baxter-Jones, A. D., Faulkner, R. A., Forwood, M. R., Mirwald, R. L., and Bailey, D. A. (2011). Bone mineral accrual from 8 to 30 years of age: An estimation of peak bone mass. Journal of Bone and Mineral Research, 26(8):1729-1739.

Elhakeem, A., Hughes, R., Tilling, K., Cousminer, D., Jackowski, S., Cole, T., Kwong, A., Li, Z., Grant, S., Baxter-Jones, A., Zemel, B., and Lawlor, D. (2022). Using linear and natural cubic splines, sitar, and latent trajectory models to characterise nonlinear longitudinal growth trajectories in cohort studies. BMC Medical Research Methodology, 22(68).

Li, Z. and Cao, J. (2022). General P-Splines for Non-Uniform B-Splines.

Examples

require(mgcv)
require(gps.mgcv)

## get synthetic BMC data
MBMC <- synBMC(gender = "male")
FBMC <- synBMC(gender = "female")

## number of subjects
NM <- nlevels(MBMC$id)
NF <- nlevels(FBMC$id)

## split data by subject
MBMC.id <- split(MBMC[-1], MBMC$id)
FBMC.id <- split(FBMC[-1], FBMC$id)

## visualize data
op <- par(mfrow = c(2, 3), mar = c(2, 2, 1.5, 0.5))
## male: bmc ~ age
with(MBMC, plot(age, bmc, type = "n", ann = FALSE))
for (i in 1:NM) with(MBMC.id[[i]], lines(age, bmc, col = i))
title("male: bmc ~ age")
## male: log(bmc) ~ age
with(MBMC, plot(age, log.bmc, type = "n", ann = FALSE))
for (i in 1:NM) with(MBMC.id[[i]], lines(age, log.bmc, col = i))
title("male: log(bmc) ~ age")
## male: hist(age)
hist(MBMC$age, breaks = 15, ann = FALSE)
title("male: age distribution")
## female: bmc ~ age
with(FBMC, plot(age, bmc, type = "n", ann = FALSE))
for (i in 1:NF) with(FBMC.id[[i]], lines(age, bmc, col = i))
title("female: bmc ~ age")
## female: log(bmc) ~ age
with(FBMC, plot(age, log.bmc, type = "n", ann = FALSE))
for (i in 1:NF) with(FBMC.id[[i]], lines(age, log.bmc, col = i))
title("female: log(bmc) ~ age")
## female: hist(age)
hist(FBMC$age, breaks = 15, ann = FALSE)
title("female: age distribution")
par(op)

## Not run: 

## fit models for male group
gps.10 <- FitBMC(gender = "male", p = 10, gps = TRUE)$gam
sps.10 <- FitBMC(gender = "male", p = 10, gps = FALSE)$gam

## newdata
age <- seq.int(min(MBMC$age), max(MBMC$age), length.out = 100)
newdat1 <- data.frame(age = rep.int(age, NM),
                      id = rep(levels(MBMC$id), each = length(age)))
newdat2 <- data.frame(age = age, id = levels(MBMC$id)[1])

## predict all subject BMC trajectories
sub.gps.10 <- matrix(exp(predict(gps.10, newdat1)), nrow = length(age))
sub.sps.10 <- matrix(exp(predict(sps.10, newdat1)), nrow = length(age))

## predict population BMC trajectory
pop.gps.10 <- predict(gps.10, newdat2, type = "terms", terms = "s(age)")
pop.sps.10 <- predict(sps.10, newdat2, type = "terms", terms = "s(age)")
pop.gps.10 <- exp(as.numeric(pop.gps.10) + attr(pop.gps.10, "constant"))
pop.sps.10 <- exp(as.numeric(pop.sps.10) + attr(pop.sps.10, "constant"))

## extract knots
knots.gps.10 <- gps.10$smooth[[1]]$knots
knots.sps.10 <- sps.10$smooth[[1]]$knots

## plot data and fitted trajectories
op <- par(mfrow = c(1, 2), mar = c(2, 2, 1.5, 0.5))
## a common ylim for the next 2 plots
ylim <- range(sub.gps.10[, c(41, 66)], sub.sps.10[, c(41, 66)])
## standard P-spline, population trajectory and subject 41, 66
plot(age, pop.sps.10, type = "l", lwd = 2, ylim = ylim, ann = FALSE)
with(MBMC.id[[41]], points(age, bmc, pch = 19, cex = 1.5, col = 8))
with(MBMC.id[[66]], points(age, bmc, pch = 19, cex = 1.5, col = 8))
abline(v = knots.sps.10, lty = 2, col = 8)
lines(age, sub.sps.10[, 41], lwd = 2, lty = 2)
lines(age, sub.sps.10[, 66], lwd = 2, lty = 2)
title("standard P-spline")
## general P-spline, population trajectory and subject 41, 66
plot(age, pop.gps.10, type = "l", lwd = 2, ylim = ylim, ann = FALSE)
with(MBMC.id[[41]], points(age, bmc, pch = 19, cex = 1.5, col = 8))
with(MBMC.id[[66]], points(age, bmc, pch = 19, cex = 1.5, col = 8))
abline(v = knots.gps.10, lty = 2, col = 8)
lines(age, sub.gps.10[, 41], lwd = 2, lty = 2)
lines(age, sub.gps.10[, 66], lwd = 2, lty = 2)
title("general P-spline")
par(op)

## End(Not run)

ZheyuanLi/gps.mgcv documentation built on Sept. 19, 2024, 9:11 p.m.