This function searches through a correlation matrix and returns a vector of integers corresponding to columns to remove to reduce pair-wise correlations.
findCorrelation( x, cutoff = 0.9, verbose = FALSE, names = FALSE, exact = ncol(x) < 100 )
A correlation matrix
A numeric value for the pair-wise absolute correlation cutoff
A boolean for printing the details
a logical; should the column names be returned (
a logical; should the average correlations be recomputed at each step? See Details below.
The absolute values of pair-wise correlations are considered. If two variables have a high correlation, the function looks at the mean absolute correlation of each variable and removes the variable with the largest mean absolute correlation.
exact = TRUE will cause the function to re-evaluate the average
correlations at each step while
exact = FALSE uses all the
correlations regardless of whether they have been eliminated or not. The
exact calculations will remove a smaller number of predictors but can be
much slower when the problem dimensions are "big".
There are several function in the subselect package
anneal) that can also be used to accomplish
the same goal but tend to retain more predictors.
A vector of indices denoting the columns to remove (when
= TRUE) otherwise a vector of column names. If no correlations meet the
integer(0) is returned.
Original R code by Dong Li, modified by Max Kuhn
R1 <- structure(c(1, 0.86, 0.56, 0.32, 0.85, 0.86, 1, 0.01, 0.74, 0.32, 0.56, 0.01, 1, 0.65, 0.91, 0.32, 0.74, 0.65, 1, 0.36, 0.85, 0.32, 0.91, 0.36, 1), .Dim = c(5L, 5L)) colnames(R1) <- rownames(R1) <- paste0("x", 1:ncol(R1)) R1 findCorrelation(R1, cutoff = .6, exact = FALSE) findCorrelation(R1, cutoff = .6, exact = TRUE) findCorrelation(R1, cutoff = .6, exact = TRUE, names = FALSE) R2 <- diag(rep(1, 5)) R2[2, 3] <- R2[3, 2] <- .7 R2[5, 3] <- R2[3, 5] <- -.7 R2[4, 1] <- R2[1, 4] <- -.67 corrDF <- expand.grid(row = 1:5, col = 1:5) corrDF$correlation <- as.vector(R2) levelplot(correlation ~ row + col, corrDF) findCorrelation(R2, cutoff = .65, verbose = TRUE) findCorrelation(R2, cutoff = .99, verbose = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.