BMC | R Documentation |
The synthetic dataset contains BMC measured at growing ages for white ethnic subjects, with 932 data from 112 males and 1113 data from 127 females. An additive mixed model based on general P-splines is ideal for analyzing this dataset, able to produce smooth estimations of population and subject BMC trajectories.
synBMC(gender)
FitBMC(gender, p, gps = TRUE, subset = NULL)
cvBMC(gender, gps = TRUE, n.sim = 100, nc = 1)
gender |
the gender group ("male" or "female") to study. |
p |
number of B-splines to represent |
gps |
TRUE to construct |
subset |
an optional vector of integers that specifies a data subset to be used for model fitting (for example, it is used when |
n.sim |
number of simulations in cross-validation. |
nc |
number of CPU cores to use for parallel simulations. |
For either gender group, the synthetic BMC dataset is a dataframe with four variables:
id
, subject's ID ("M001" to "M112" for males and "F001" to "F127" for females);
age
, subject's age when a measurement was taken;
bmc
, BMC measurements (in grams);
log.bmc
, log-transformed BMC measurements.
It is a synthetic dataset that highly resembles the real one from The Saskatchewan Pediatric Bone Mineral Accrual Study (PBMAS).
Denote age, BMC and log(BMC) by x
, y
and z
, respectively. The model for either gender group is:
z_{j,x} = f(x) + f_j(x) + e_{j,x},
where
f(x)
(a fixed-effect spline) is the population log(BMC) trajectory;
f_j(x)
(a random-effect spline) is subject j
's log(BMC) deviation from f(x)
;
e_{j, x}
is an i.i.d. Gaussian measurement error.
FitBMC
sets up f(x)
and f_j(x)
as standard or general P-splines then fits this model with gamm
.
Let n
be the number of subjects in either gender group. Let vectors \bm{x_j}
, \bm{y_j}
and \bm{z_j}
be the data for subject j
. Let \delta = \min\{\Delta\bm{x_1}, \Delta\bm{x_2}, \ldots, \Delta\bm{x_n}\}
be the minimum step size in age values. The synthetic data are simulated as follows.
Fit the additive mixed model above to the real data (\bm{z_j}, \bm{x_j}, j)_{1}^{n}
;
Randomize the association between \bm{x_j}
and j
to obtain (\bm{x_j}, r_j)_{1}^{n}
, where r_1, r_2, \ldots, r_{n}
is a random shuffle of 1, 2, \ldots, n
;
Add a small amount of noise ("jitter") to obtain \bm{\tilde{x}_j} = \bm{x_j} + \bm{\delta_j}
, where \bm{\delta_j}
is vector of samples from uniform distribution on [-\delta/5, -\delta/10] \cup [\delta/10, \delta/5]
;
Predict \bm{\hat{z}_j}
on (\bm{\tilde{x}_j}, r_j)_{1}^{n}
using the fitted model and draw \bm{\tilde{z}_j} \sim \textrm{N}(\bm{\hat{z}_j}, \bm{I}\hat{\sigma}_e^2)
, where \hat{\sigma}_e^2
is the estimated variance of measurement error and \bm{I}
is an identity matrix;
Back transform \bm{\tilde{y}_j} = \exp(\bm{\tilde{z}_j})
and compose the synthetic dataset as (\bm{\tilde{y}_j}, \bm{\tilde{x}_j}, r_j)_{1}^{n}
.
The synthetic dataset preserves two aspects of the real dataset.
Distribution of age values is almost unchanged;
The number of data per subject is unchanged.
The fitted model has an R-squared as high as 99.97% for males and 99.92% for females. Thus, statistical modeling on two datasets does not end up with much difference.
synBMC
returns a data frame of four variables.
FitBMC
returns a fitted "gamm" model, which is a large list with an "lme" component and a (much more useful) "gam" component.
cvBMC
returns a matrix of 3 columns. Column 1 gives the mean cross-validation error (averaged over n.sim
simulations) for p = 5, 6, \ldots, 20
. Columns 2 and 3 give two-standard-error lower and upper bounds for this mean.
We do not hold authentic BMC data. The synthetic version was generated by Ahmed Elhakeem (using our idea and code, though), who holds a copy of the authentic version for his research (see References) collaborated with PBMAS's program director, Professor Adam DG Baxter-Jones. The synthetic dataset is also shared by Elhakeem at https://raw.githubusercontent.com/aelhak/nltmr/main/synth_cohort.csv. Researches interested in the authentic BMC data should contact Professor Baxter-Jones (baxter.jones@usask.ca) in the first place.
Zheyuan Li zheyuan.li@bath.edu
Bailey, D. A. (1997). The Saskatchewan Pediatric Bone Mineral Accrual Study: Bone mineral acquisition during the growing years. International Journal of Sports Medicine, 18:s191-s194.
Baxter-Jones, A. D., Faulkner, R. A., Forwood, M. R., Mirwald, R. L., and Bailey, D. A. (2011). Bone mineral accrual from 8 to 30 years of age: An estimation of peak bone mass. Journal of Bone and Mineral Research, 26(8):1729-1739.
Elhakeem, A., Hughes, R., Tilling, K., Cousminer, D., Jackowski, S., Cole, T., Kwong, A., Li, Z., Grant, S., Baxter-Jones, A., Zemel, B., and Lawlor, D. (2022). Using linear and natural cubic splines, sitar, and latent trajectory models to characterise nonlinear longitudinal growth trajectories in cohort studies. BMC Medical Research Methodology, 22(68).
Li, Z. and Cao, J. (2022). General P-Splines for Non-Uniform B-Splines.
require(mgcv)
require(gps.mgcv)
## get synthetic BMC data
MBMC <- synBMC(gender = "male")
FBMC <- synBMC(gender = "female")
## number of subjects
NM <- nlevels(MBMC$id)
NF <- nlevels(FBMC$id)
## split data by subject
MBMC.id <- split(MBMC[-1], MBMC$id)
FBMC.id <- split(FBMC[-1], FBMC$id)
## visualize data
op <- par(mfrow = c(2, 3), mar = c(2, 2, 1.5, 0.5))
## male: bmc ~ age
with(MBMC, plot(age, bmc, type = "n", ann = FALSE))
for (i in 1:NM) with(MBMC.id[[i]], lines(age, bmc, col = i))
title("male: bmc ~ age")
## male: log(bmc) ~ age
with(MBMC, plot(age, log.bmc, type = "n", ann = FALSE))
for (i in 1:NM) with(MBMC.id[[i]], lines(age, log.bmc, col = i))
title("male: log(bmc) ~ age")
## male: hist(age)
hist(MBMC$age, breaks = 15, ann = FALSE)
title("male: age distribution")
## female: bmc ~ age
with(FBMC, plot(age, bmc, type = "n", ann = FALSE))
for (i in 1:NF) with(FBMC.id[[i]], lines(age, bmc, col = i))
title("female: bmc ~ age")
## female: log(bmc) ~ age
with(FBMC, plot(age, log.bmc, type = "n", ann = FALSE))
for (i in 1:NF) with(FBMC.id[[i]], lines(age, log.bmc, col = i))
title("female: log(bmc) ~ age")
## female: hist(age)
hist(FBMC$age, breaks = 15, ann = FALSE)
title("female: age distribution")
par(op)
## Not run:
## fit models for male group
gps.10 <- FitBMC(gender = "male", p = 10, gps = TRUE)$gam
sps.10 <- FitBMC(gender = "male", p = 10, gps = FALSE)$gam
## newdata
age <- seq.int(min(MBMC$age), max(MBMC$age), length.out = 100)
newdat1 <- data.frame(age = rep.int(age, NM),
id = rep(levels(MBMC$id), each = length(age)))
newdat2 <- data.frame(age = age, id = levels(MBMC$id)[1])
## predict all subject BMC trajectories
sub.gps.10 <- matrix(exp(predict(gps.10, newdat1)), nrow = length(age))
sub.sps.10 <- matrix(exp(predict(sps.10, newdat1)), nrow = length(age))
## predict population BMC trajectory
pop.gps.10 <- predict(gps.10, newdat2, type = "terms", terms = "s(age)")
pop.sps.10 <- predict(sps.10, newdat2, type = "terms", terms = "s(age)")
pop.gps.10 <- exp(as.numeric(pop.gps.10) + attr(pop.gps.10, "constant"))
pop.sps.10 <- exp(as.numeric(pop.sps.10) + attr(pop.sps.10, "constant"))
## extract knots
knots.gps.10 <- gps.10$smooth[[1]]$knots
knots.sps.10 <- sps.10$smooth[[1]]$knots
## plot data and fitted trajectories
op <- par(mfrow = c(1, 2), mar = c(2, 2, 1.5, 0.5))
## a common ylim for the next 2 plots
ylim <- range(sub.gps.10[, c(41, 66)], sub.sps.10[, c(41, 66)])
## standard P-spline, population trajectory and subject 41, 66
plot(age, pop.sps.10, type = "l", lwd = 2, ylim = ylim, ann = FALSE)
with(MBMC.id[[41]], points(age, bmc, pch = 19, cex = 1.5, col = 8))
with(MBMC.id[[66]], points(age, bmc, pch = 19, cex = 1.5, col = 8))
abline(v = knots.sps.10, lty = 2, col = 8)
lines(age, sub.sps.10[, 41], lwd = 2, lty = 2)
lines(age, sub.sps.10[, 66], lwd = 2, lty = 2)
title("standard P-spline")
## general P-spline, population trajectory and subject 41, 66
plot(age, pop.gps.10, type = "l", lwd = 2, ylim = ylim, ann = FALSE)
with(MBMC.id[[41]], points(age, bmc, pch = 19, cex = 1.5, col = 8))
with(MBMC.id[[66]], points(age, bmc, pch = 19, cex = 1.5, col = 8))
abline(v = knots.gps.10, lty = 2, col = 8)
lines(age, sub.gps.10[, 41], lwd = 2, lty = 2)
lines(age, sub.gps.10[, 66], lwd = 2, lty = 2)
title("general P-spline")
par(op)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.