knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(FORD) library(FOCI)
In this vignette, we demonstrate FORD algorithm in A New Measure Of Dependence: Integrated R2, a forward stepwise variable selection algorithm based on the integrated $R^2$ dependence measure. FORD is designed for variable ranking in both linear and nonlinear multivariate regression settings.
FORD closely follows the structure of FOCI A Simple Measure Of Conditional Dependence, but replaces the core dependence measure with irdc.
Let $Y$ be the response variable and $\mathbf{X} = (X_1, \dots, X_p)$ the predictor variables. Given $n$ i.i.d. samples of $(Y, \mathbf{X})$, FORD proceeds as follows:
Select $j_1 = \arg\max_j \nu_n(Y, X_j)$
If $\nu_n(Y, X_{j_1}) \leq 0$, return $\hat{V} = \emptyset$
Iteratively add the feature that gives the maximum increase in irdc: $$ j_{k+1} = \arg\max_{j \notin {j_1, \ldots, j_k}} \nu_n(Y, (X_{j_1}, \ldots, X_{j_k}, X_j)) $$
Stop when the irdc does not increase anymore: $$ \nu_n(Y, (X_{j_1}, \ldots, X_{j_k}, X_{j_{k+1}})) \leq \nu_n(Y, (X_{j_1}, \ldots, X_{j_k})) $$
If no such $k$ exists, select all variables.
Here, $Y$ depends only on the first 4 features of $X$ in a nonlinear way.
set.seed(42) n <- 2000 p <- 100 X <- matrix(rnorm(n * p), ncol = p) colnames(X) <- paste0("X", seq_len(p)) Y <- X[, 1] * X[, 2] + sin(X[, 1] * X[, 3]) + X[, 4]^2
result_foci_1 <- foci(Y, X, numCores = 1) result_foci_1
result_ford_1 <- ford(Y, X, numCores = 1) result_ford_1
We can force both FOCI and FORD to select a specific number of variables instead of using an automatic stopping rule.
result_foci_2 <- foci(Y, X, num_features = 5, stop = FALSE, numCores = 1) result_foci_2
result_ford_2 <- ford(Y, X, num_features = 5, stop = FALSE, numCores = 1) result_ford_2
FORD provides an interpretable, irdc-based alternative to FOCI for variable selection in regression tasks. It offers a principled forward selection framework that can detect complex nonlinear relationships and be adapted for fixed-size feature subsets.
For further theoretical details, see our paper:
Azadkia and Roudaki (2025), A New Measure Of Dependence: Integrated R2
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.