varCluster: Predictor Clustering and Cluster Summary Variable Extraction

Description Usage Arguments Details Value Examples

Description

Perform agglomerative (hierarchical) clustering on a set of numeric predictor variables on the basis of absolute correlation distance. Create a summary variable for each cluster to use for dimension reduction.

Usage

1
2
3
varCluster(x, y, corr.min = sqrt(2)/2, clus.summary = "max.pw",
  corr.method = "pearson", corr.use = "complete.obs",
  clus.method = "complete")

Arguments

x

(data frame) data frame containing the set of numeric predictors to be clustered

y

(numeric) the intended response variable (to generate x-y correlations)

corr.min

(numeric) minimum correlation required to join variables into clusters

clus.summary

(character) method used to extract a summary variable for each cluster (see details)

corr.method

(character) which correlation coefficient to use: c('pearson', 'spearman')

corr.use

(character) how to deal with missing values: c('complete.obs', 'pairwise.complete.obs')

clus.method

(character) how to calculate linkages: c('complete', 'single', 'centroid')

Details

Variables will be agglomerated into clusters on the basis of absolute correlation distance (1 - abs(cor)) using a hierarchical clustering algorithm (hclust). The resulting dendogram will be cut based on the value of corr.min supplied by the user, such that all clusters contain variables with linkage correlations of at least corr.min. A single summary variable will be created for each cluster based on the value of clus.summary supplied by the user, with behavior detailed below. For all variables not linked into a cluster, the original variable is returned as the "summary" measure unchanged.

Value

a list containing the following elements:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
library(caret)
data("ChemicalManufacturingProcess")
x <- ChemicalManufacturingProcess[,-1]
y <- ChemicalManufacturingProcess[, 1]
res <- varCluster(x, y, corr.min = 0.75, clus.summary = 'pc.1', corr.method = 'spearman', corr.use = 'complete.obs')
res$nvar
res$nclust
res$clusters
str(res$summaries)
res$corrrplot
res$params

etlundquist/eRic documentation built on May 16, 2019, 9:07 a.m.