variable_cluster() performs non-hierarchical (disjoint)
variable clustering on the numeric variables in a specified data frame. This
approach is similar to that of the
varclus procedure in SAS. Like
that approach, the default behavior is to remove records with any missing
na.rm = TRUE). Unlike that approach, however, missing values
can be easily replaced with their column means (
na.rm = FALSE). Note
that this can greatly increase the size of the data to be clustered, which
will increase the run time of the function. An example of this was timed on a
data frame with about 230K observations and 200 variables where the function
took 10 minutes. Future efforts should be made to optimize this or explore
data frame; contains the variables to be clustered
integer value >= 2; number of desired clusters
logical value; should records with missing values be removed?
A data frame with class "
mt_variable_cluster" containing the
VarName: variable name (character)
PrimaryCluster: assigned cluster (integer)
RsquaredToPrimaryCluster: R-squared to primary cluster (numeric)
NearestCluster: next nearest cluster (integer)
RsquaredToNearestCluster: R-squared to nearest cluster (numeric)
OneMinusRsquaredRatio: one minus R-squared ratio (numeric)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.