View source: R/temporal_forest.R
| temporal_forest | R Documentation |
The main user-facing function for the TemporalForest package. It performs the
complete three-stage algorithm to select a top set of features from
high-dimensional longitudinal data.
temporal_forest(
X = NULL,
Y,
id,
time,
dissimilarity_matrix = NULL,
n_features_to_select = 10,
min_module_size = 4,
n_boot_screen = 50,
keep_fraction_screen = 0.25,
n_boot_select = 100,
alpha_screen = 0.2,
alpha_select = 0.05
)
X |
A list of numeric matrices, one for each time point. The rows of each
matrix should be subjects and columns should be predictors. Required unless
|
Y |
A numeric vector for the longitudinal outcome. |
id |
A vector of subject identifiers. |
time |
A vector of time point indicators. |
dissimilarity_matrix |
An optional pre-computed dissimilarity matrix (e.g., |
n_features_to_select |
The number of top features to return in the final selection.
This is passed to the |
min_module_size |
The minimum number of features in a module. Passed to the
|
n_boot_screen |
The number of bootstrap repetitions for the initial screening stage within modules. Defaults to 50. |
keep_fraction_screen |
The proportion of features to keep from each module during the screening stage. Defaults to 0.25. |
n_boot_select |
The number of bootstrap repetitions for the final stability selection stage. Defaults to 100. |
alpha_screen |
The significance level for splitting in the screening stage trees. Defaults to 0.2. |
alpha_select |
The significance level for splitting in the selection stage trees. Defaults to 0.05. |
The function executes a three-stage process:
Time-Aware Module Construction: Builds a consensus network across time points to identify modules of stably co-correlated features.
Within-Module Screening: Uses bootstrapped mixed-effects model trees (glmertree) to screen for important predictors within each module.
Stability Selection: Performs a final stability selection step on the surviving features to yield a reproducible final set.
Unbalanced Panels: The algorithm is robust to unbalanced panel data (i.e., subjects with missing time points). The consensus TOM is constructed using the time points available, and the mixed-effects models naturally handle missing observations.
Outcome Family: The current version is designed for Gaussian (continuous) outcomes, as it relies on glmertree::lmertree. Support for other outcome families is not yet implemented.
Reproducibility (Determinism): For reproducible results, it is recommended to set a seed using set.seed() before running. The algorithm has both stochastic and deterministic components:
Stochastic (depends on set.seed()): The bootstrap resampling of subjects in both the screening and selection stages.
Deterministic (does not depend on set.seed()): The network construction process (correlation, adjacency, and TOM calculation).
An object of class TemporalForest with:
top_features (character): the K selected features in
descending stability order.
candidate_features (character): all features that
entered the final (second-stage) selection.
X: list of numeric matrices, one per time point; columns (names and order) must be identical across all time points. The function does not reorder or reconcile columns.
Row order / binding rule: when rows from X are stacked internally,
they are assumed to already be in subject-major × time-minor order in
the user's data. The function does not re-order subjects or time.
Y, id, time: vectors of equal length. id and time may be
integer/character/factor; time is coerced to a numeric sequence
via as.numeric(as.factor(time)).
Missing values: this function does not perform NA filtering or
imputation. Users should pre-clean the data (e.g., keep <- complete.cases(Y,id,time)).
Missing time points per subject are allowed provided the user supplies
X, Y, id, time that already align under the binding rule above.
Stage 1 builds a TOM at the feature level for each available time-point
matrix; the consensus TOM is the element-wise minimum across time points.
Subject-level missingness at a given time does not prevent feature-wise
similarity from being computed at other times. This function does not perform
any subject-level alignment across time.
Current version targets Gaussian outcomes via glmertree::lmertree.
Other families (e.g., binomial/Poisson) are not supported in this version.
Final selection is top-K by bootstrap frequency (K = n_features_to_select).
A probability cutoff (e.g., pi_thr) is not used and selection
probabilities are not returned in the current API.
Stochastic (affected by set.seed()): bootstrap resampling and tree
partitioning.
Deterministic: correlation/adjacency/TOM and consensus-TOM given fixed inputs.
An internal helper check_temporal_consistency is called
automatically at the start (whenever dissimilarity_matrix is NULL).
It throws an error if column names across time points are not identical
(names and order).
The current API does not expose selection probabilities, module labels, or a parameter snapshot; these may be added in a future version.
Sisi Shao, Jason H. Moore, Christina M. Ramirez
Shao, S., Moore, J.H., Ramirez, C.M. (2025). Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data. Journal of Statistical Software.
select_soft_power, calculate_fs_metrics_cv,
calculate_pred_metrics_cv, check_temporal_consistency
# Tiny demo: selects V1, V2, V3 quickly (skips Stage 1 via precomputed A)
set.seed(11)
n_subjects <- 60; n_timepoints <- 2; p <- 20
X <- replicate(n_timepoints, matrix(rnorm(n_subjects * p), n_subjects, p), simplify = FALSE)
colnames(X[[1]]) <- colnames(X[[2]]) <- paste0("V", 1:p)
X_long <- do.call(rbind, X)
id <- rep(seq_len(n_subjects), each = n_timepoints)
time <- rep(seq_len(n_timepoints), times = n_subjects)
u <- rnorm(n_subjects, 0, 0.7)
eps <- rnorm(length(id), 0, 0.08)
Y <- 4*X_long[,"V1"] + 3.5*X_long[,"V2"] + 3.2*X_long[,"V3"] + rep(u, each = n_timepoints) + eps
A <- 1 - abs(stats::cor(X_long)); diag(A) <- 0
dimnames(A) <- list(colnames(X[[1]]), colnames(X[[1]]))
fit <- temporal_forest(
X, Y, id, time,
dissimilarity_matrix = A,
n_features_to_select = 3,
n_boot_screen = 6, n_boot_select = 18,
keep_fraction_screen = 1, min_module_size = 2,
alpha_screen = 0.5, alpha_select = 0.6
)
print(fit$top_features)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.