TemporalForest: A Quick Start Guide
In TemporalForest: Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data

old_ops <- options()
suppressPackageStartupMessages(library(TemporalForest))
knitr::opts_chunk$set(
  collapse   = TRUE,  
  comment    = "#>",   
  fig.width  = 7,
  fig.height = 5,
  message    = FALSE, 
  warning    = FALSE  
)
options(stringsAsFactors = FALSE)
suppressPackageStartupMessages({
  ok_wgcna <- requireNamespace("WGCNA", quietly = TRUE)
})
if (ok_wgcna && "disableWGCNAThreads" %in% getNamespaceExports("WGCNA")) {
  suppressMessages(WGCNA::disableWGCNAThreads())
}

Abstract

The TemporalForest package provides a reproducible method for feature selection in high-dimensional longitudinal data. It combines network analysis, mixed-effects models, and stability selection to identify robust predictors over time. This vignette offers a quick start guide to using the package.

1. Introduction

Longitudinal 'omics studies, where subjects are measured repeatedly over time, present unique challenges for feature selection: high dimensionality, temporal dependence, and complex correlations. The TemporalForest algorithm addresses these by creating a robust, multi-stage pipeline that identifies features which are both predictive and stable across resamples.

2. Installation

Since the package is not yet on CRAN, you can install the development version from GitHub:

# install.packages("remotes")
remotes::install_github("SisiShao/TemporalForest")

3. Quick Start: Primary Example

This example walks you through a complete analysis with a small, simulated dataset.

Simulate a Longitudinal Dataset

This tiny demo is designed to always return all true signals quickly (1–3s). We will simulate a dataset with 60 subjects, 2 time points, and 20 potential predictors. We will inject 3 true signals into the outcome (Y), coming from predictors V1, V2, and V3. To ensure the example is fast and reliable for CRAN, we will pass a precomputed dissimilarity matrix to skip Stage 1 (WGCNA/TOM).

set.seed(11) # For reproducibility
n_subjects <- 60; n_timepoints <- 2; p <- 20

# Build X (two time points) with matching colnames
X <- replicate(n_timepoints, matrix(rnorm(n_subjects * p), n_subjects, p), simplify = FALSE)
colnames(X[[1]]) <- colnames(X[[2]]) <- paste0("V", 1:p)

# Long view and IDs
X_long <- do.call(rbind, X)
id     <- rep(seq_len(n_subjects), each = n_timepoints)
time   <- rep(seq_len(n_timepoints), times = n_subjects)

# Strong signal on V1, V2, V3 + modest subject random effect + small noise
u_subj <- rnorm(n_subjects, 0, 0.7)
eps    <- rnorm(length(id), 0, 0.08)
Y <- 4*X_long[, "V1"] + 3.5*X_long[, "V2"] + 3.2*X_long[, "V3"] +
     rep(u_subj, each = n_timepoints) + eps

# Lightweight dissimilarity to skip Stage 1 (fast on CRAN)
A <- 1 - abs(stats::cor(X_long)); diag(A) <- 0
dimnames(A) <- list(colnames(X[[1]]), colnames(X[[1]]))

Run TemporalForest

We call the main function, passing our precomputed dissimilarity_matrix = A and asking for 3 features.

# Run TemporalForest with minimal settings for vignette
tf_result <- temporal_forest(
  X = X, Y = Y, id = id, time = time,
  dissimilarity_matrix = A,       # skip WGCNA/TOM (Stage 1)
  n_features_to_select = 3,       
  n_boot_screen = 4, # Very low for quick demo
  n_boot_select =8, # Very low for quick demo
  keep_fraction_screen = 1,       # Permissive screening
  min_module_size = 2,
  alpha_screen = 0.5,             # Permissive screening
  alpha_select = 0.6
)

Interpret the Results

Examine the selected features and check if the true predictors were found.

print(tf_result)

# Validate against ground truth
true_predictors <- c("V1", "V2", "V3")
cat("True predictors found:", sum(true_predictors %in% tf_result$top_features), 
    "out of", length(true_predictors), "\n")

The algorithm successfully identified all three true predictors in this high signal-to-noise example.

4. How TemporalForest Works

TemporalForest operates in three stages:

Time-Aware Module Construction: Groups correlated features into modules that are stable across time points using a consensus topological overlap matrix (TOM).
Within-Module Screening: Uses mixed-effects model trees to select the most important predictor from each module while accounting for within-subject correlations.
Stability Selection: Applies bootstrapping to calculate selection probabilities, ensuring only the most reproducible features are included in the final set.

5. Key Parameters Guide

n_features_to_select: Final number of features to return (default: 10)
n_boot_screen, n_boot_select: Number of bootstrap samples for screening and selection stages. Increase for more stable results (defaults: 50, 100).
keep_fraction_screen: Proportion of features from each module passed to final selection (default: 0.25). Increase if too few features are selected.
min_module_size: Minimum size for network modules (default: 4).
alpha_screen, alpha_select: Significance levels for splitting in screening and selection trees (defaults: 0.2, 0.05).

6. Troubleshooting

| Symptom | Likely Cause | Solution | |---------|--------------|----------| | No features selected | Screening too strict | Increase keep_fraction_screen or alpha_screen | | Too many features selected | Selection too liberal | Decrease keep_fraction_screen or alpha_select | | Long computation time | Data too large | Reduce bootstrap numbers or pre-filter features |

7. Input Data Validation

The package includes checks for proper data formatting. Here's an example of the error message for inconsistent inputs:

# This will produce a clear error message
mat1 <- matrix(1:4, nrow=2, dimnames=list(NULL, c("A", "B")))
mat2 <- matrix(1:4, nrow=2, dimnames=list(NULL, c("A", "C")))
bad_X <- list(mat1, mat2)

TemporalForest::check_temporal_consistency(bad_X)

8. Conclusion

TemporalForest provides an end-to-end solution for reproducible feature selection in longitudinal high-dimensional data. For detailed information on all function parameters and advanced usage, see the package documentation (?TemporalForest).

9. Citation

To cite TemporalForest in publications, please use:

citation("TemporalForest")

Session Info

sessionInfo()
options(old_ops)

Any scripts or data that you put into this service are public.

TemporalForest documentation built on Dec. 23, 2025, 1:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

TemporalForest
Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data

TemporalForest: A Quick Start Guide
In TemporalForest: Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data

Abstract

1. Introduction

2. Installation

3. Quick Start: Primary Example

Simulate a Longitudinal Dataset

Run TemporalForest

Interpret the Results

4. How TemporalForest Works

5. Key Parameters Guide

6. Troubleshooting

7. Input Data Validation

8. Conclusion

9. Citation

Session Info

Try the TemporalForest package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

TemporalForest Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data

TemporalForest: A Quick Start Guide In TemporalForest: Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data

Abstract

1. Introduction

2. Installation

3. Quick Start: Primary Example

Simulate a Longitudinal Dataset

Run TemporalForest

Interpret the Results

4. How TemporalForest Works

5. Key Parameters Guide

6. Troubleshooting

7. Input Data Validation

8. Conclusion

9. Citation

Session Info

Try the TemporalForest package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

TemporalForest
Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data

TemporalForest: A Quick Start Guide
In TemporalForest: Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data