TemporalForest: A Quick Start Guide

old_ops <- options()
suppressPackageStartupMessages(library(TemporalForest))
knitr::opts_chunk$set(
  collapse   = TRUE,  
  comment    = "#>",   
  fig.width  = 7,
  fig.height = 5,
  message    = FALSE, 
  warning    = FALSE  
)
options(stringsAsFactors = FALSE)
suppressPackageStartupMessages({
  ok_wgcna <- requireNamespace("WGCNA", quietly = TRUE)
})
if (ok_wgcna && "disableWGCNAThreads" %in% getNamespaceExports("WGCNA")) {
  suppressMessages(WGCNA::disableWGCNAThreads())
}

Abstract

The TemporalForest package provides a reproducible method for feature selection in high-dimensional longitudinal data. It combines network analysis, mixed-effects models, and stability selection to identify robust predictors over time. This vignette offers a quick start guide to using the package.

1. Introduction

Longitudinal 'omics studies, where subjects are measured repeatedly over time, present unique challenges for feature selection: high dimensionality, temporal dependence, and complex correlations. The TemporalForest algorithm addresses these by creating a robust, multi-stage pipeline that identifies features which are both predictive and stable across resamples.

2. Installation

Since the package is not yet on CRAN, you can install the development version from GitHub:

# install.packages("remotes")
remotes::install_github("SisiShao/TemporalForest")

3. Quick Start: Primary Example

This example walks you through a complete analysis with a small, simulated dataset.

Simulate a Longitudinal Dataset

This tiny demo is designed to always return all true signals quickly (1–3s). We will simulate a dataset with 60 subjects, 2 time points, and 20 potential predictors. We will inject 3 true signals into the outcome (Y), coming from predictors V1, V2, and V3. To ensure the example is fast and reliable for CRAN, we will pass a precomputed dissimilarity matrix to skip Stage 1 (WGCNA/TOM).

set.seed(11) # For reproducibility
n_subjects <- 60; n_timepoints <- 2; p <- 20

# Build X (two time points) with matching colnames
X <- replicate(n_timepoints, matrix(rnorm(n_subjects * p), n_subjects, p), simplify = FALSE)
colnames(X[[1]]) <- colnames(X[[2]]) <- paste0("V", 1:p)

# Long view and IDs
X_long <- do.call(rbind, X)
id     <- rep(seq_len(n_subjects), each = n_timepoints)
time   <- rep(seq_len(n_timepoints), times = n_subjects)

# Strong signal on V1, V2, V3 + modest subject random effect + small noise
u_subj <- rnorm(n_subjects, 0, 0.7)
eps    <- rnorm(length(id), 0, 0.08)
Y <- 4*X_long[, "V1"] + 3.5*X_long[, "V2"] + 3.2*X_long[, "V3"] +
     rep(u_subj, each = n_timepoints) + eps

# Lightweight dissimilarity to skip Stage 1 (fast on CRAN)
A <- 1 - abs(stats::cor(X_long)); diag(A) <- 0
dimnames(A) <- list(colnames(X[[1]]), colnames(X[[1]]))

Run TemporalForest

We call the main function, passing our precomputed dissimilarity_matrix = A and asking for 3 features.

# Run TemporalForest with minimal settings for vignette
tf_result <- temporal_forest(
  X = X, Y = Y, id = id, time = time,
  dissimilarity_matrix = A,       # skip WGCNA/TOM (Stage 1)
  n_features_to_select = 3,       
  n_boot_screen = 4, # Very low for quick demo
  n_boot_select =8, # Very low for quick demo
  keep_fraction_screen = 1,       # Permissive screening
  min_module_size = 2,
  alpha_screen = 0.5,             # Permissive screening
  alpha_select = 0.6
)

Interpret the Results

Examine the selected features and check if the true predictors were found.

print(tf_result)
# Validate against ground truth
true_predictors <- c("V1", "V2", "V3")
cat("True predictors found:", sum(true_predictors %in% tf_result$top_features), 
    "out of", length(true_predictors), "\n")

The algorithm successfully identified all three true predictors in this high signal-to-noise example.

4. How TemporalForest Works

TemporalForest operates in three stages:

  1. Time-Aware Module Construction: Groups correlated features into modules that are stable across time points using a consensus topological overlap matrix (TOM).
  2. Within-Module Screening: Uses mixed-effects model trees to select the most important predictor from each module while accounting for within-subject correlations.
  3. Stability Selection: Applies bootstrapping to calculate selection probabilities, ensuring only the most reproducible features are included in the final set.

5. Key Parameters Guide

6. Troubleshooting

| Symptom | Likely Cause | Solution | |---------|--------------|----------| | No features selected | Screening too strict | Increase keep_fraction_screen or alpha_screen | | Too many features selected | Selection too liberal | Decrease keep_fraction_screen or alpha_select | | Long computation time | Data too large | Reduce bootstrap numbers or pre-filter features |

7. Input Data Validation

The package includes checks for proper data formatting. Here's an example of the error message for inconsistent inputs:

# This will produce a clear error message
mat1 <- matrix(1:4, nrow=2, dimnames=list(NULL, c("A", "B")))
mat2 <- matrix(1:4, nrow=2, dimnames=list(NULL, c("A", "C")))
bad_X <- list(mat1, mat2)

TemporalForest::check_temporal_consistency(bad_X)

8. Conclusion

TemporalForest provides an end-to-end solution for reproducible feature selection in longitudinal high-dimensional data. For detailed information on all function parameters and advanced usage, see the package documentation (?TemporalForest).

9. Citation

To cite TemporalForest in publications, please use:

citation("TemporalForest")

Session Info

sessionInfo()
options(old_ops)


Try the TemporalForest package in your browser

Any scripts or data that you put into this service are public.

TemporalForest documentation built on Dec. 23, 2025, 1:06 a.m.