ford-demo"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(FORD)   
library(FOCI)

Introduction

In this vignette, we demonstrate FORD algorithm in A New Measure Of Dependence: Integrated R2, a forward stepwise variable selection algorithm based on the integrated $R^2$ dependence measure. FORD is designed for variable ranking in both linear and nonlinear multivariate regression settings.

FORD closely follows the structure of FOCI A Simple Measure Of Conditional Dependence, but replaces the core dependence measure with irdc.


Algorithm

Let $Y$ be the response variable and $\mathbf{X} = (X_1, \dots, X_p)$ the predictor variables. Given $n$ i.i.d. samples of $(Y, \mathbf{X})$, FORD proceeds as follows:

  1. Select $j_1 = \arg\max_j \nu_n(Y, X_j)$
    If $\nu_n(Y, X_{j_1}) \leq 0$, return $\hat{V} = \emptyset$

  2. Iteratively add the feature that gives the maximum increase in irdc: $$ j_{k+1} = \arg\max_{j \notin {j_1, \ldots, j_k}} \nu_n(Y, (X_{j_1}, \ldots, X_{j_k}, X_j)) $$

  3. Stop when the irdc does not increase anymore: $$ \nu_n(Y, (X_{j_1}, \ldots, X_{j_k}, X_{j_{k+1}})) \leq \nu_n(Y, (X_{j_1}, \ldots, X_{j_k})) $$

If no such $k$ exists, select all variables.


Example 1 — Complex nonlinear function of first 4 features

Here, $Y$ depends only on the first 4 features of $X$ in a nonlinear way.

set.seed(42)
n <- 2000
p <- 100
X <- matrix(rnorm(n * p), ncol = p)
colnames(X) <- paste0("X", seq_len(p))
Y <- X[, 1] * X[, 2] + sin(X[, 1] * X[, 3]) + X[, 4]^2

FOCI Result

result_foci_1 <- foci(Y, X, numCores = 1)
result_foci_1

FORD Result

result_ford_1 <- ford(Y, X, numCores = 1)
result_ford_1

Example 2 — Selecting a fixed number of variables

We can force both FOCI and FORD to select a specific number of variables instead of using an automatic stopping rule.

FOCI with 5 selected features

result_foci_2 <- foci(Y, X, num_features = 5, stop = FALSE, numCores = 1)
result_foci_2

FORD with 5 selected features

result_ford_2 <- ford(Y, X, num_features = 5, stop = FALSE, numCores = 1)
result_ford_2

Conclusion

FORD provides an interpretable, irdc-based alternative to FOCI for variable selection in regression tasks. It offers a principled forward selection framework that can detect complex nonlinear relationships and be adapted for fixed-size feature subsets.

For further theoretical details, see our paper:
Azadkia and Roudaki (2025), A New Measure Of Dependence: Integrated R2



Try the FORD package in your browser

Any scripts or data that you put into this service are public.

FORD documentation built on June 8, 2025, 10:03 a.m.