Description Usage Arguments Details Value Examples
Perform agglomerative (hierarchical) clustering on a set of numeric predictor variables on the basis of absolute correlation distance. Create a summary variable for each cluster to use for dimension reduction.
1 2 3 | varCluster(x, y, corr.min = sqrt(2)/2, clus.summary = "max.pw",
corr.method = "pearson", corr.use = "complete.obs",
clus.method = "complete")
|
x |
(data frame) data frame containing the set of numeric predictors to be clustered |
y |
(numeric) the intended response variable (to generate x-y correlations) |
corr.min |
(numeric) minimum correlation required to join variables into clusters |
clus.summary |
(character) method used to extract a summary variable for each cluster (see details) |
corr.method |
(character) which correlation coefficient to use: |
corr.use |
(character) how to deal with missing values: |
clus.method |
(character) how to calculate linkages: |
Variables will be agglomerated into clusters on the basis of absolute correlation distance (1 - abs(cor))
using a hierarchical clustering algorithm (hclust). The resulting dendogram will be cut based on the value
of corr.min
supplied by the user, such that all clusters contain variables with linkage correlations
of at least corr.min
. A single summary variable will be created for each cluster based on the value
of clus.summary
supplied by the user, with behavior detailed below. For all variables not linked into
a cluster, the original variable is returned as the "summary" measure unchanged.
max.pw - original variable with highest average pairwise correlation amongst cluster variables
max.y - original variable with highest correlation with the response (y
)
avg.x - average of standardized variables within the cluster for each observation
pc.1 - first principal component of cluster variables
a list containing the following elements:
nvar - number of variables in x
nclust - number of created clusters
vif - VIF values for each original variable in x
clusters - a list for each cluster containing the number of variables in the cluster, the names of the variables in the cluster, all pairwise correlations, and predictor-response correlations)
summaries - a data frame containing the summary variables (one for each cluster)
corrplot - a ggplot2 object visualizing the correlations amongst all predictor variables
params - a list of all parameter values passed in the original call
1 2 3 4 5 6 7 8 9 10 11 | library(caret)
data("ChemicalManufacturingProcess")
x <- ChemicalManufacturingProcess[,-1]
y <- ChemicalManufacturingProcess[, 1]
res <- varCluster(x, y, corr.min = 0.75, clus.summary = 'pc.1', corr.method = 'spearman', corr.use = 'complete.obs')
res$nvar
res$nclust
res$clusters
str(res$summaries)
res$corrrplot
res$params
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.