Description Usage Arguments Value See Also Examples
View source: R/variable_cluster.R
variable_cluster()
performs non-hierarchical (disjoint)
variable clustering on the numeric variables in a specified data frame. This
approach is similar to that of the varclus
procedure in SAS. Like
that approach, the default behavior is to remove records with any missing
values (na.rm = TRUE
). Unlike that approach, however, missing values
can be easily replaced with their column means (na.rm = FALSE
). Note
that this can greatly increase the size of the data to be clustered, which
will increase the run time of the function. An example of this was timed on a
data frame with about 230K observations and 200 variables where the function
took 10 minutes. Future efforts should be made to optimize this or explore
alternative approaches.
1 | variable_cluster(x, n, na.rm = TRUE)
|
x |
data frame; contains the variables to be clustered |
n |
integer value >= 2; number of desired clusters |
na.rm |
logical value; should records with missing values be removed? |
A data frame with class "mt_variable_cluster
" containing the
following columns:
VarName
: variable name (character)
PrimaryCluster
: assigned cluster (integer)
RsquaredToPrimaryCluster
: R-squared to primary cluster (numeric)
NearestCluster
: next nearest cluster (integer)
RsquaredToNearestCluster
: R-squared to nearest cluster (numeric)
OneMinusRsquaredRatio
: one minus R-squared ratio (numeric)
1 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.