This guide introduces tools used for longitudinal function-on-scalar regression in the BayesFoSR package. We simulate and plot a dataset, fit a model using variational Bayes, and plot the result.
We simulate 100 subjects with 3 functional responses each under the function-on-scalar regression model with correlated errors. Three coefficient functions are used (a constant intercept, a sine curve, and a cosine curve), and scalar predictors are generated from a standard normal distribution. Subject random effects are generated from a basis consisting of a sine and cosine curve with loadings generated from mean-zero independent normal distributions with standard deviation 2. Correlated errors are introduced using functional principal components with two basis functions (a sine and cosine of different periods than the coefficient functions) and subject/visit-specific loadings are generated from mean-zero normal distributions with variance 3 and 1. Finally, uncorrelated measurement error is added to all curve. The figure below shows this observed data.
library(BayesFoSR) library(ggplot2) library(MASS) ############################################################### ## set simulation design elements ############################################################### I = 100 J = 3 p = 3 D = 50 grid = seq(0, 1, length = D) beta.true = matrix(0, p, D) beta.true[1,] = sin(2*grid*pi) beta.true[2,] = cos(2*grid*pi) beta.true[3,] = 2 b.basis = matrix(0, 2, D) b.basis[1,] = sin(2*grid*pi) b.basis[2,] = cos(2*grid*pi) subj.ranef = mvrnorm(I, mu = rep(0, 2), Sigma = diag(c(1,1))) %*% b.basis psi.true = matrix(NA, 2, D) psi.true[1,] = sin(4*grid*pi) psi.true[2,] = cos(4*grid*pi) lambda = c(3,1) ## seed for reproducibility set.seed(100) data = data.frame(x1 = rep(rnorm(I), each = J), x2 = rep(rnorm(I), each = J), id = rep(1:I, each = J)) data$id = factor(data$id) X = model.matrix( ~ 1 + x1 + x2, data = data) Z = model.matrix( ~ 0 + id + (-1):id, data = data) C = mvrnorm(I*J, mu = rep(0, 2), diag(lambda)) fixef = X %*% beta.true ranef = Z %*% subj.ranef pcaef = C %*% psi.true error = matrix(rnorm(I*J*D), I*J, D) Yi.obs = fixef + ranef + pcaef + error
## observed data obs.data = cbind(as.vector(t(Yi.obs)), rep(1:(I*J), each = D), rep(grid, I*J)) obs.data = as.data.frame(obs.data) colnames(obs.data) = c("y", "curve", "grid") ggplot(obs.data, aes(x = grid, y = y, group = curve)) + geom_path(alpha = .3) + theme_bw() + labs(x = "Grid", y = "Observed Data")
Several options are available for fitting the function-on-scalar regression model:
Variational Bayes is a fast approximation to the true posterior, and often provides reasonable point estimates. However, the approximation requires assumptions regarding the posterior independence of model parameters and can lead to poor inference. The Gibbs sampler produces samples from the posterior, but is more computationally demanding; also, if parameters are correlated, mixing can be slow.
The Bayesian FPCA estimates a pre-specified number of components using a spline expansion for each, as well as the associated scores; this provides a parsimonious approach to estimating the covariance surface, but requires that the number of components is known in advance or explored as part of the modeling process. The Wishart prior is potentially more flexible and requires less user input, but is much less parsimonious particularly for dense grids.
Below, we fit the function on scalar regression model using VB and FPCA for the covariance structure. Code for the Gibbs sampler is commented to save computation time and memory space.
fit.vb = bayes_fosr(Yi.obs~ x1 + x2 + re(id), data = data, Kt = 10, Kp = 2, est.method = "VB", cov.method = "FPCA") # fit.gibbs = bayes_fosr(Yi.obs~ x1 + x2 + re(id), data = data, # Kt = 10, Kp = 2, est.method = "Gibbs", cov.method = "FPCA")
Below we plot the estimated coefficient functions from both estimation procedures and compare to the true coefficients.
plot.dat = data.frame(est = c(as.vector(t(beta.true)), as.vector(t(fit.vb$beta.pm))), grid = rep(grid, p*2), curve = rep(1:(2*p), each = D), Method = rep(c("Truth", "VarBayes"), each = p*D)) ggplot(plot.dat, aes(x = grid, y = est, group = curve, color = Method)) + geom_path() + theme_bw() + labs(x = "Grid", y = "Beta")
Next we plot the estimated FPCA basis functions from the VB algorithm and compare to the true basis functions.
plot.dat = data.frame(est = c(as.vector(t(psi.true)), as.vector(t(fit.vb$psi.pm))), grid = rep(grid, 2*2), curve = rep(1:(2*2), each = D), Method = rep(c("Truth", "VarBayes"), each = 2*D)) ggplot(plot.dat, aes(x = grid, y = est, group = curve, color = Method)) + geom_path() + theme_bw() + labs(x = "Grid", y = "Beta")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.