corrSelect | R Documentation |
Identifies combinations of numeric variables in a data frame such that all pairwise absolute correlations
fall below a specified threshold. This function is a wrapper around MatSelect()
and accepts data frames, tibbles, or data tables with automatic preprocessing.
corrSelect(
df,
threshold = 0.7,
method = NULL,
force_in = NULL,
cor_method = c("pearson", "spearman", "kendall", "bicor", "distance", "maximal"),
...
)
df |
A data frame. Only numeric columns are used. |
threshold |
A numeric value in (0, 1). Maximum allowed absolute correlation. Defaults to 0.7. |
method |
Character. Selection algorithm to use. One of |
force_in |
Optional character vector or numeric indices of columns to force into all subsets. |
cor_method |
Character string indicating which correlation method to use.
One of |
... |
Additional arguments passed to |
Only numeric columns are used for correlation analysis. Non‐numeric columns (factors, characters,
logicals, etc.) are ignored, and their names and types are printed to inform the user. These can be
optionally reattached later using corrSubset()
with keepExtra = TRUE
.
Rows with missing values are removed before computing correlations. A warning is issued if any rows are dropped.
The cor_method
controls how the correlation matrix is computed:
"pearson"
: Standard linear correlation.
"spearman"
: Rank-based monotonic correlation.
"kendall"
: Kendall's tau.
"bicor"
: Biweight midcorrelation (WGCNA::bicor).
"distance"
: Distance correlation (energy::dcor).
"maximal"
: Maximal information coefficient (minerva::mine).
For "bicor"
, "distance"
, and "maximal"
, the corresponding
package must be installed.
An object of class CorrCombo
, containing selected subsets and correlation statistics.
assocSelect()
, MatSelect()
, corrSubset()
set.seed(42)
n <- 100
# Create 20 variables: 5 blocks of correlated variables + some noise
block1 <- matrix(rnorm(n * 4), ncol = 4)
block2 <- matrix(rnorm(n), ncol = 1)
block2 <- matrix(rep(block2, 4), ncol = 4) + matrix(rnorm(n * 4, sd = 0.1), ncol = 4)
block3 <- matrix(rnorm(n * 4), ncol = 4)
block4 <- matrix(rnorm(n * 4), ncol = 4)
block5 <- matrix(rnorm(n * 4), ncol = 4)
df <- as.data.frame(cbind(block1, block2, block3, block4, block5))
colnames(df) <- paste0("V", 1:20)
# Add a non-numeric column to be ignored
df$label <- factor(sample(c("A", "B"), n, replace = TRUE))
# Basic usage
corrSelect(df, threshold = 0.8)
# Try Bron–Kerbosch with pivoting
corrSelect(df, threshold = 0.6, method = "bron-kerbosch", use_pivot = TRUE)
# Force in a specific variable and use Spearman correlation
corrSelect(df, threshold = 0.6, force_in = "V10", cor_method = "spearman")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.