fastFMM Vignette: Fast Functional Mixed Models using Fast Univariate Inference

knitr::opts_chunk$set(comment = "#>", warning=FALSE, message=FALSE)
library(fastFMM)
output: pdf_document
# output: rmarkdown::html_vignette
# Thanks to Yihui Xie for providing this code
# #  %\VignetteEngine{knitr::rmarkdown} 
# %\VignetteEngine{rmarkdown::render}
library(knitr)
hook_output <- knit_hooks$get("output")
knit_hooks$set(output = function(x, options) {
   lines <- options$output.lines
   if (is.null(lines)) {
     return(hook_output(x, options))  # pass to default hook
   }
   x <- unlist(strsplit(x, "\n"))
   more <- "..."
   if (length(lines)==1) {        # first n lines
     if (length(x) > lines) {
       # truncate the output, but add ....
       x <- c(head(x, lines), more)
     }
   } else {
     x <- c(more, x[lines], more)
   }
   # paste these lines together
   x <- paste(c(x, ""), collapse = "\n")
   hook_output(x, options)
 })

Introduction

fastFMM is a fast toolkit for fitting functional mixed models using the fast univariate inference (FUI) method proposed in Cui et al. (2022). Assume the multi-level/longitudinal functional data are of the type $Y_{ij}(s)$ on a grid ${s_1, ..., s_L}$ of the compact functional domain $\mathcal{S}$. Denote by $i = 1, 2, ..., I$ the index of subject, and $j = 1, 2, ..., J_i$ the index of longitudinal visit at time $t_{ij}$. In addition to $Y_{ij}(s)$, for each visit of each subject, we also observe $\boldsymbol{X}{ij} = [X{ij1}, X_{ij2}, ..., X_{ijp}]^T \in \mathbb{R}^p$ as the fixed and $\boldsymbol{Z}{ij} = [Z{ij1}, Z_{ij2}, ..., Z_{ijq}]^T \in \mathbb{R}^q$ as the random effects variables. Denote by $g(\cdot)$ a pre-specified link function and $EF(\cdot)$ the exponential family distribution. A longitudinal function-on-scalar regression model has the following form:

$$ \begin{aligned} & Y_{ij}(s) \sim EF{\mu_{ij}(s)}, \ & g{\mu_{ij}(s)} = \eta_{ij}(s) = \boldsymbol{X}{ij}^T\boldsymbol{\beta}(s) + \boldsymbol{Z}{ij}^T\boldsymbol{u}_i(s). \end{aligned} $$ The model above is also called Functional Mixed Model (FMM) in the Functional Data Analysis (FDA) literature. Many statistical methods have been proposed to fit this model, which in general fall into two categories: joint methods and marginal methods. FUI is a marginal method that is computationally fast and achieves similar estimation accuracy compared with the state-of-the-art joint method, such as the refund::pffr() function.

FUI consists of the following three steps:

  1. At each location $s_l \in \mathcal{S}, l = 1, 2, ..., L$, fit a pointwise generalized linear mixed model (GLMM):

$$ \begin{aligned} & Y_{ij}(s_l) \sim EF{\mu_{ij}(s_l)}, \ & g{\mu_{ij}(s_l)} = \eta_{ij}(s_l) = \boldsymbol{X}{ij}^T\boldsymbol{\beta}(s_l) + \boldsymbol{Z}{ij}^T\boldsymbol{u}_i(s_l). \end{aligned} $$ Here $\boldsymbol{\beta}(s_l)$ is a $p \times 1$ dimensional vector of fixed effects and $\boldsymbol{u}_i(s_l)$ is a $q \times 1$ dimensional vector of random effects. Denote the estimates of fixed effects from $L$ separate univariate GLMMs as $\boldsymbol{\tilde{\beta}}(s_1), \boldsymbol{\tilde{\beta}}(s_2), ..., \boldsymbol{\tilde{\beta}}(s_L)$.

  1. Smooth the estimated fixed effects $\boldsymbol{\tilde{\beta}}(s_1), \boldsymbol{\tilde{\beta}}(s_2), ..., \boldsymbol{\tilde{\beta}}(s_L)$ along the functional domain. The smoothed estimators are denoted as ${\boldsymbol{\hat{\beta}}(s), s \in \mathcal{S}}$.

  2. Obtain the pointwise and joint confidence bands for fixed effects parameters using analytic approaches for Gaussian data or a bootstrap of subjects (cluster bootstrap) for Gaussian and non-Gaussian data.

The key insight of the FUI approach is to decompose the complex correlation structure into longitudinal and functional directions. For more details on the analytic and bootstrap approach, please refer to the paper.

In this vignette, we provide a tutorial on how to use the fui function to fit FMM. The toolkit is currently implemented in R, but may incorporate C++ to speed up the code in the future development. Notice that the model function assumes that the data are stored in a $\texttt{data.frame}$, where the predictors are stored as separate columns and the outcome is stored as a matrix; see examples for the required format.

Installation

The development version of fastFMM can be downloaded from github by executing:

library(devtools)
install_github("gloewing/fastFMM")

Package Setup

Downloading the development version of the package requires one to install package dependencies. We have included the code below to install any $\texttt{CRAN}$ packages not already downloaded on your computer.

# install packages (will only install them for the first time)
list.of.packages = c("lme4", "parallel", "cAIC4", "magrittr","dplyr",
                      "mgcv", "MASS", "lsei", "refund","stringr", "Matrix", "mvtnorm", 
                      "arrangements", "progress", "ggplot2", "gridExtra", "here")
new.packages = list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)){
  chooseCRANmirror(ind=75)
  install.packages(new.packages, dependencies = TRUE)
}else{
  # load packages if already installed
  library(lme4)
  library(parallel)
  library(cAIC4)
  library(magrittr)
  library(dplyr)
  library(mgcv) 
  library(MASS)
  library(lsei) 
  library(refund)
  library(stringr) 
  library(Matrix) 
  library(mvtnorm) 
  library(arrangements) 
  library(progress) 
  library(ggplot2)
  library(gridExtra)
}
library(fastFMM) # load our package

Tutorial Guide

To illustrate the main functions in our fastFMM package, we start by analyzing synthetic data in which the functional domain is defined by time: each functional observation is a short time-series ``trial''.

Data Formating

Let's load the data and take a look at it's structure.

dat <- read.csv("time_series.csv") # read in data
head(dat[,1:6])

This dataset has $N = \sum_{i = 1}^I n_i$ rows, where $n_i$ is the number of repeated measures observations of subject/cluster $i$. The first 3 columns of the dataset include the subject ID, and the covariates $\texttt{trial}$ and $\texttt{treatment}$. Each of the remaining columns are observations of our functional outcome named $[\texttt{Y.1}, ~ \texttt{Y.2}, ~ ...~\texttt{Y.L}]$ where $L$ is the number of points we observe the function on.

When we specify a model below using the fui() function, it will take the columns starting with the string 'Y.' and assume they are ordered from left to right. For that reason, make sure no other variables begin with the same substring.

Alternatively, instead of relying on the column names, we can save the matrix $N \times L$ matrix associated with our functional outcomes as a ``column'' of a dataframe.

Y_mat <- dat[,-seq(1,3)]
head(Y_mat[,1:5])

Now we save the Y_mat matrix as a ``column'' in the dataframe:

dat <- data.frame(Y = Y_mat, dat[,seq(1,3)])

The fui() function accepts either format style.

Model Fitting and Function Arguments

Let's start with a simple model not too unlike the functional version of a paired-samples t-test: we include the binary $\texttt{treatment}$ covariate with a random slope. Observe that the syntax for the model formula is based off of the lmer() function in the lme4 package. The model is then:

$$ \begin{aligned} & Y_{ij}(s) = \beta_0(s) + {X}{ij}{\beta}_1(s) + {u}_i(s) + \epsilon{ij}(s) \notag \end{aligned} $$ We start with implementation of the model with analytic inference.

mod <- fui(Y ~ treatment + # main effect of cue
                  (1 | id),  # random intercept
                data = dat) 

Unless specified otherwise, fui() assumes a Gaussian family and provides inference based on an analytic form. Below we fit a similar model but include a random subject-specific random slope for treatment.

Y_mat <- dat[,-seq(1,3)]
L <- ncol(Y_mat) # number of columns of functional outcome

mod <- fui(Y ~ treatment + # main effect of cue
                  (treatment | id),  # random intercept
                  data = dat,
                  argvals = seq(from = 1, to = L, by = 3) # every 3rd data point
                  ) 
Y_mat <- dat[,-seq(1,3)]
L <- ncol(Y_mat) # number of columns of functional outcome

# model 1: random slope model
mod1 <- fui(Y ~ treatment + # main effect of cue
                  (treatment | id),  # random intercept
                  data = dat,
                  var = FALSE) 

# model 2: random intercept model
mod2 <- fui(Y ~ treatment + # main effect of cue
                  (1 | id),  # random intercept
                  data = dat,
                  var = FALSE) 

# compare model fits
colMeans(mod1$aic)
colMeans(mod2$aic)

Since model 2 has a lower average AIC, one might wish to proceed with inference using that model.

mod <- fui(Y ~ treatment + # main effect of cue
                  (treatment | id/trial),  # random intercept
                  data = dat,
                  subj_ID = "id") 

Model Fit Plots

After fitting a model, one can quickly visualize estimates and 95% confidence intervals with out plot function.

mod <- fui(Y ~ treatment + # main effect of cue
                  (1 | id),  # random intercept
                  data = dat) 

fui_plot <- plot_fui(mod)

In some cases, especially if the functional domain is time, one may wish to rescale the x-axis (using the x_rescale argument) according to the sampling rate and align the plot to a certain time-point (using the align_x argument). We also allow the user to adjust how many rows the plots are plotted on (through the nrow argument). Each coefficient's plot can be given a plot title through the vector argument title_names and the x-axis can be named through the xlab argument.

align_time <- 1 # cue onset is at 2 seconds
sampling_Hz <- 15 # sampling rate
# plot titles: interpretation of beta coefficients
plot_names <- c("Intercept", "Mean Signal Difference: Cue1 - Cue0") 
fui_plot <- plot_fui(mod, # model fit object
                     x_rescale = sampling_Hz, # rescale x-axis to sampling rate
                     align_x = align_time, # align to cue onset
                     title_names = plot_names,
                     xlab = "Time (s)",
                     num_row = 2) 

One can access the regression coefficient estimates and associated pointwise and joint 95% confidence intervals by looking at the object outputted from the plot_fui() function. This is stored in the object which is a list that has $p$ elements (one for each fixed effect). Each element in that list (named according to the corresponding fixed effects covariate) is a data-frame that has $L$ rows ($L$ is number of points observed along the functional domain) and columns for coefficient estimates and 95% CIs.

Advanced Options

Additional fui() arguments

Additional plot_fui() arguments

Troubleshooting

References

Doug Bates. lme4: Mixed-effects modeling with R (2022).

Erjia Cui, Andrew Leroux, Ekaterina Smirnova, and Ciprian Crainiceanu. Fast Univariate Inference for Longitudinal Functional Models. Journal of Computational and Graphical Statistics (2022).



Try the fastFMM package in your browser

Any scripts or data that you put into this service are public.

fastFMM documentation built on April 12, 2025, 2:26 a.m.