old_ops <- options() suppressPackageStartupMessages(library(TemporalForest)) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, message = FALSE, warning = FALSE ) options(stringsAsFactors = FALSE) suppressPackageStartupMessages({ ok_wgcna <- requireNamespace("WGCNA", quietly = TRUE) }) if (ok_wgcna && "disableWGCNAThreads" %in% getNamespaceExports("WGCNA")) { suppressMessages(WGCNA::disableWGCNAThreads()) }
The TemporalForest package provides a reproducible method for feature selection in high-dimensional longitudinal data. It combines network analysis, mixed-effects models, and stability selection to identify robust predictors over time. This vignette offers a quick start guide to using the package.
Longitudinal 'omics studies, where subjects are measured repeatedly over time, present unique challenges for feature selection: high dimensionality, temporal dependence, and complex correlations. The TemporalForest algorithm addresses these by creating a robust, multi-stage pipeline that identifies features which are both predictive and stable across resamples.
Since the package is not yet on CRAN, you can install the development version from GitHub:
# install.packages("remotes") remotes::install_github("SisiShao/TemporalForest")
This example walks you through a complete analysis with a small, simulated dataset.
This tiny demo is designed to always return all true signals quickly (1–3s). We will simulate a dataset with 60 subjects, 2 time points, and 20 potential predictors. We will inject 3 true signals into the outcome (Y), coming from predictors V1, V2, and V3. To ensure the example is fast and reliable for CRAN, we will pass a precomputed dissimilarity matrix to skip Stage 1 (WGCNA/TOM).
set.seed(11) # For reproducibility n_subjects <- 60; n_timepoints <- 2; p <- 20 # Build X (two time points) with matching colnames X <- replicate(n_timepoints, matrix(rnorm(n_subjects * p), n_subjects, p), simplify = FALSE) colnames(X[[1]]) <- colnames(X[[2]]) <- paste0("V", 1:p) # Long view and IDs X_long <- do.call(rbind, X) id <- rep(seq_len(n_subjects), each = n_timepoints) time <- rep(seq_len(n_timepoints), times = n_subjects) # Strong signal on V1, V2, V3 + modest subject random effect + small noise u_subj <- rnorm(n_subjects, 0, 0.7) eps <- rnorm(length(id), 0, 0.08) Y <- 4*X_long[, "V1"] + 3.5*X_long[, "V2"] + 3.2*X_long[, "V3"] + rep(u_subj, each = n_timepoints) + eps # Lightweight dissimilarity to skip Stage 1 (fast on CRAN) A <- 1 - abs(stats::cor(X_long)); diag(A) <- 0 dimnames(A) <- list(colnames(X[[1]]), colnames(X[[1]]))
We call the main function, passing our precomputed dissimilarity_matrix = A and asking for 3 features.
# Run TemporalForest with minimal settings for vignette tf_result <- temporal_forest( X = X, Y = Y, id = id, time = time, dissimilarity_matrix = A, # skip WGCNA/TOM (Stage 1) n_features_to_select = 3, n_boot_screen = 4, # Very low for quick demo n_boot_select =8, # Very low for quick demo keep_fraction_screen = 1, # Permissive screening min_module_size = 2, alpha_screen = 0.5, # Permissive screening alpha_select = 0.6 )
Examine the selected features and check if the true predictors were found.
print(tf_result)
# Validate against ground truth true_predictors <- c("V1", "V2", "V3") cat("True predictors found:", sum(true_predictors %in% tf_result$top_features), "out of", length(true_predictors), "\n")
The algorithm successfully identified all three true predictors in this high signal-to-noise example.
TemporalForest operates in three stages:
n_features_to_select: Final number of features to return (default: 10)n_boot_screen, n_boot_select: Number of bootstrap samples for screening and selection stages. Increase for more stable results (defaults: 50, 100).keep_fraction_screen: Proportion of features from each module passed to final selection (default: 0.25). Increase if too few features are selected.min_module_size: Minimum size for network modules (default: 4).alpha_screen, alpha_select: Significance levels for splitting in screening and selection trees (defaults: 0.2, 0.05).| Symptom | Likely Cause | Solution |
|---------|--------------|----------|
| No features selected | Screening too strict | Increase keep_fraction_screen or alpha_screen |
| Too many features selected | Selection too liberal | Decrease keep_fraction_screen or alpha_select |
| Long computation time | Data too large | Reduce bootstrap numbers or pre-filter features |
The package includes checks for proper data formatting. Here's an example of the error message for inconsistent inputs:
# This will produce a clear error message mat1 <- matrix(1:4, nrow=2, dimnames=list(NULL, c("A", "B"))) mat2 <- matrix(1:4, nrow=2, dimnames=list(NULL, c("A", "C"))) bad_X <- list(mat1, mat2) TemporalForest::check_temporal_consistency(bad_X)
TemporalForest provides an end-to-end solution for reproducible feature selection in longitudinal high-dimensional data. For detailed information on all function parameters and advanced usage, see the package documentation (?TemporalForest).
To cite TemporalForest in publications, please use:
citation("TemporalForest")
sessionInfo() options(old_ops)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.