temporal_forest: Temporal Forest for Longitudinal Feature Selection
In TemporalForest: Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data

temporal_forest

R Documentation

Temporal Forest for Longitudinal Feature Selection

Description

The main user-facing function for the TemporalForest package. It performs the complete three-stage algorithm to select a top set of features from high-dimensional longitudinal data.

Usage

temporal_forest(
  X = NULL,
  Y,
  id,
  time,
  dissimilarity_matrix = NULL,
  n_features_to_select = 10,
  min_module_size = 4,
  n_boot_screen = 50,
  keep_fraction_screen = 0.25,
  n_boot_select = 100,
  alpha_screen = 0.2,
  alpha_select = 0.05
)

Arguments

`X`	A list of numeric matrices, one for each time point. The rows of each matrix should be subjects and columns should be predictors. Required unless `dissimilarity_matrix` is provided.
`Y`	A numeric vector for the longitudinal outcome.
`id`	A vector of subject identifiers.
`time`	A vector of time point indicators.
`dissimilarity_matrix`	An optional pre-computed dissimilarity matrix (e.g., `1 - TOM`). If provided, the network construction step (Stage 1) is skipped. The matrix must be square with predictor names as rownames and colnames. Defaults to `NULL`.
`n_features_to_select`	The number of top features to return in the final selection. This is passed to the `number_selected_final` argument of the internal function. Defaults to 10.
`min_module_size`	The minimum number of features in a module. Passed to the `minClusterSize` argument of the internal function. Defaults to 4.
`n_boot_screen`	The number of bootstrap repetitions for the initial screening stage within modules. Defaults to 50.
`keep_fraction_screen`	The proportion of features to keep from each module during the screening stage. Defaults to 0.25.
`n_boot_select`	The number of bootstrap repetitions for the final stability selection stage. Defaults to 100.
`alpha_screen`	The significance level for splitting in the screening stage trees. Defaults to 0.2.
`alpha_select`	The significance level for splitting in the selection stage trees. Defaults to 0.05.

Details

The function executes a three-stage process:

Time-Aware Module Construction: Builds a consensus network across time points to identify modules of stably co-correlated features.
Within-Module Screening: Uses bootstrapped mixed-effects model trees (glmertree) to screen for important predictors within each module.
Stability Selection: Performs a final stability selection step on the surviving features to yield a reproducible final set.

Unbalanced Panels: The algorithm is robust to unbalanced panel data (i.e., subjects with missing time points). The consensus TOM is constructed using the time points available, and the mixed-effects models naturally handle missing observations.

Outcome Family: The current version is designed for Gaussian (continuous) outcomes, as it relies on glmertree::lmertree. Support for other outcome families is not yet implemented.

Reproducibility (Determinism): For reproducible results, it is recommended to set a seed using set.seed() before running. The algorithm has both stochastic and deterministic components:

Stochastic (depends on set.seed()): The bootstrap resampling of subjects in both the screening and selection stages.
Deterministic (does not depend on set.seed()): The network construction process (correlation, adjacency, and TOM calculation).

Value

An object of class TemporalForest with:

top_features (character): the K selected features in descending stability order.
candidate_features (character): all features that entered the final (second-stage) selection.

Input contract

X: list of numeric matrices, one per time point; columns (names and order) must be identical across all time points. The function does not reorder or reconcile columns.
Row order / binding rule: when rows from X are stacked internally, they are assumed to already be in subject-major × time-minor order in the user's data. The function does not re-order subjects or time.
Y, id, time: vectors of equal length. id and time may be integer/character/factor; time is coerced to a numeric sequence via as.numeric(as.factor(time)).
Missing values: this function does not perform NA filtering or imputation. Users should pre-clean the data (e.g., keep <- complete.cases(Y,id,time)).

Unbalanced panels

Missing time points per subject are allowed provided the user supplies X, Y, id, time that already align under the binding rule above. Stage 1 builds a TOM at the feature level for each available time-point matrix; the consensus TOM is the element-wise minimum across time points. Subject-level missingness at a given time does not prevent feature-wise similarity from being computed at other times. This function does not perform any subject-level alignment across time.

Outcome family

Current version targets Gaussian outcomes via glmertree::lmertree. Other families (e.g., binomial/Poisson) are not supported in this version.

Stability selection and thresholds

Final selection is top-K by bootstrap frequency (K = n_features_to_select). A probability cutoff (e.g., pi_thr) is not used and selection probabilities are not returned in the current API.

Reproducibility (determinism)

Stochastic (affected by set.seed()): bootstrap resampling and tree partitioning.
Deterministic: correlation/adjacency/TOM and consensus-TOM given fixed inputs.

Internal validation

An internal helper check_temporal_consistency is called automatically at the start (whenever dissimilarity_matrix is NULL). It throws an error if column names across time points are not identical (names and order).

Note

The current API does not expose selection probabilities, module labels, or a parameter snapshot; these may be added in a future version.

Author(s)

Sisi Shao, Jason H. Moore, Christina M. Ramirez

References

Shao, S., Moore, J.H., Ramirez, C.M. (2025). Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data. Journal of Statistical Software.

Examples


# Tiny demo: selects V1, V2, V3 quickly (skips Stage 1 via precomputed A)
set.seed(11)
n_subjects <- 60; n_timepoints <- 2; p <- 20
X <- replicate(n_timepoints, matrix(rnorm(n_subjects * p), n_subjects, p), simplify = FALSE)
colnames(X[[1]]) <- colnames(X[[2]]) <- paste0("V", 1:p)
X_long <- do.call(rbind, X)
id   <- rep(seq_len(n_subjects), each = n_timepoints)
time <- rep(seq_len(n_timepoints), times = n_subjects)
u <- rnorm(n_subjects, 0, 0.7)
eps <- rnorm(length(id), 0, 0.08)
Y <- 4*X_long[,"V1"] + 3.5*X_long[,"V2"] + 3.2*X_long[,"V3"] + rep(u, each = n_timepoints) + eps
A <- 1 - abs(stats::cor(X_long)); diag(A) <- 0
dimnames(A) <- list(colnames(X[[1]]), colnames(X[[1]]))
fit <- temporal_forest(
  X, Y, id, time,
  dissimilarity_matrix = A,
  n_features_to_select = 3,
  n_boot_screen = 6, n_boot_select = 18,
  keep_fraction_screen = 1, min_module_size = 2,
  alpha_screen = 0.5, alpha_select = 0.6
)
print(fit$top_features)

TemporalForest documentation built on Dec. 23, 2025, 1:06 a.m.

TemporalForest index

README.md TemporalForest: A Quick Start Guide

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

TemporalForest
Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data

temporal_forest: Temporal Forest for Longitudinal Feature Selection
In TemporalForest: Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data

Temporal Forest for Longitudinal Feature Selection

Description

Usage

Arguments

Details

Value

Input contract

Unbalanced panels

Outcome family

Stability selection and thresholds

Reproducibility (determinism)

Internal validation

Note

Author(s)

References

See Also

Examples

Related to temporal_forest in TemporalForest...

R Package Documentation

Browse R Packages

We want your feedback!

TemporalForest Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data

temporal_forest: Temporal Forest for Longitudinal Feature Selection In TemporalForest: Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data

Temporal Forest for Longitudinal Feature Selection

Description

Usage

Arguments

Details

Value

Input contract

Unbalanced panels

Outcome family

Stability selection and thresholds

Reproducibility (determinism)

Internal validation

Note

Author(s)

References

See Also

Examples

Related to temporal_forest in TemporalForest...

R Package Documentation

Browse R Packages

We want your feedback!

TemporalForest
Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data

temporal_forest: Temporal Forest for Longitudinal Feature Selection
In TemporalForest: Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data