experiment_cor_vs_vif: Dataframe with results of experiment comparing correlation...

experiment_cor_vs_vifR Documentation

Dataframe with results of experiment comparing correlation and VIF thresholds

Description

A dataframe summarizing 10,000 experiments comparing the output of cor_select() and vif_select(). Each row records the input sampling parameters and the resulting feature-selection metrics.

Usage

data(experiment_cor_vs_vif)

Format

A dataframe with 10,000 rows and 6 variables:

input_rows

Number of rows in the input data subset.

input_predictors

Number of predictors in the input data subset.

output_predictors

Number of predictors selected by vif_select() at the best-matching max_vif.

max_cor

Maximum allowed pairwise correlation supplied to cor_select().

max_vif

VIF threshold at which vif_select() produced the highest Jaccard similarity with cor_select() for the given max_cor.

out_selection_jaccard

Jaccard similarity between the predictors selected by cor_select() and vif_select().

Details

The source data is a synthetic dataframe with 500 columns and 10,000 rows generated using distantia::zoo_simulate() with correlated time series (independent = FALSE).

Each iteration randomly subsets 10-50 predictors and 30-100 rows per predictor, applies cor_select() with a random max_cor threshold, then finds the max_vif value that maximizes Jaccard similarity between the two selections.

See Also

Other experiments: experiment_adaptive_thresholds, gam_cor_to_vif, prediction_cor_to_vif

Examples

data(experiment_cor_vs_vif)
str(experiment_cor_vs_vif)

collinear documentation built on Dec. 8, 2025, 5:06 p.m.