varCluster: Predictor Clustering and Cluster Summary Variable Extraction
In etlundquist/eRic: Eric's R functions developed while a summer analytics intern at Enova

Description Usage Arguments Details Value Examples

Perform agglomerative (hierarchical) clustering on a set of numeric predictor variables on the basis of absolute correlation distance. Create a summary variable for each cluster to use for dimension reduction.

1
2
3

varCluster(x, y, corr.min = sqrt(2)/2, clus.summary = "max.pw",
  corr.method = "pearson", corr.use = "complete.obs",
  clus.method = "complete")

`x`	(data frame) data frame containing the set of numeric predictors to be clustered
`y`	(numeric) the intended response variable (to generate x-y correlations)
`corr.min`	(numeric) minimum correlation required to join variables into clusters
`clus.summary`	(character) method used to extract a summary variable for each cluster (see details)
`corr.method`	(character) which correlation coefficient to use: `c('pearson', 'spearman')`
`corr.use`	(character) how to deal with missing values: `c('complete.obs', 'pairwise.complete.obs')`
`clus.method`	(character) how to calculate linkages: `c('complete', 'single', 'centroid')`

Variables will be agglomerated into clusters on the basis of absolute correlation distance (1 - abs(cor)) using a hierarchical clustering algorithm (hclust). The resulting dendogram will be cut based on the value of corr.min supplied by the user, such that all clusters contain variables with linkage correlations of at least corr.min. A single summary variable will be created for each cluster based on the value of clus.summary supplied by the user, with behavior detailed below. For all variables not linked into a cluster, the original variable is returned as the "summary" measure unchanged.

max.pw - original variable with highest average pairwise correlation amongst cluster variables
max.y - original variable with highest correlation with the response (y)
avg.x - average of standardized variables within the cluster for each observation
pc.1 - first principal component of cluster variables

a list containing the following elements:

nvar - number of variables in x
nclust - number of created clusters
vif - VIF values for each original variable in x
clusters - a list for each cluster containing the number of variables in the cluster, the names of the variables in the cluster, all pairwise correlations, and predictor-response correlations)
summaries - a data frame containing the summary variables (one for each cluster)
corrplot - a ggplot2 object visualizing the correlations amongst all predictor variables
params - a list of all parameter values passed in the original call

library(caret)
data("ChemicalManufacturingProcess")
x <- ChemicalManufacturingProcess[,-1]
y <- ChemicalManufacturingProcess[, 1]
res <- varCluster(x, y, corr.min = 0.75, clus.summary = 'pc.1', corr.method = 'spearman', corr.use = 'complete.obs')
res$nvar
res$nclust
res$clusters
str(res$summaries)
res$corrrplot
res$params

etlundquist/eRic documentation built on May 16, 2019, 9:07 a.m.

etlundquist/eRic index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

etlundquist/eRic
Eric's R functions developed while a summer analytics intern at Enova

varCluster: Predictor Clustering and Cluster Summary Variable Extraction
In etlundquist/eRic: Eric's R functions developed while a summer analytics intern at Enova

Description

Usage

Arguments

Details

Value

Examples

Related to varCluster in etlundquist/eRic...

R Package Documentation

Browse R Packages

We want your feedback!

etlundquist/eRic Eric's R functions developed while a summer analytics intern at Enova

varCluster: Predictor Clustering and Cluster Summary Variable Extraction In etlundquist/eRic: Eric's R functions developed while a summer analytics intern at Enova

Description

Usage

Arguments

Details

Value

Examples

Related to varCluster in etlundquist/eRic...

R Package Documentation

Browse R Packages

We want your feedback!

etlundquist/eRic
Eric's R functions developed while a summer analytics intern at Enova

varCluster: Predictor Clustering and Cluster Summary Variable Extraction
In etlundquist/eRic: Eric's R functions developed while a summer analytics intern at Enova