Description Usage Arguments Value See Also Examples
View source: R/variable_cluster.R
variable_cluster() performs non-hierarchical (disjoint)
variable clustering on the numeric variables in a specified data frame. This
approach is similar to that of the varclus procedure in SAS. Like
that approach, the default behavior is to remove records with any missing
values (na.rm = TRUE). Unlike that approach, however, missing values
can be easily replaced with their column means (na.rm = FALSE). Note
that this can greatly increase the size of the data to be clustered, which
will increase the run time of the function. An example of this was timed on a
data frame with about 230K observations and 200 variables where the function
took 10 minutes. Future efforts should be made to optimize this or explore
alternative approaches.
1 | variable_cluster(x, n, na.rm = TRUE)
|
x |
data frame; contains the variables to be clustered |
n |
integer value >= 2; number of desired clusters |
na.rm |
logical value; should records with missing values be removed? |
A data frame with class "mt_variable_cluster" containing the
following columns:
VarName: variable name (character)
PrimaryCluster: assigned cluster (integer)
RsquaredToPrimaryCluster: R-squared to primary cluster (numeric)
NearestCluster: next nearest cluster (integer)
RsquaredToNearestCluster: R-squared to nearest cluster (numeric)
OneMinusRsquaredRatio: one minus R-squared ratio (numeric)
1 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.