variable_cluster: Cluster a set of numeric variables

Description Usage Arguments Value See Also Examples

View source: R/variable_cluster.R

Description

variable_cluster() performs non-hierarchical (disjoint) variable clustering on the numeric variables in a specified data frame. This approach is similar to that of the varclus procedure in SAS. Like that approach, the default behavior is to remove records with any missing values (na.rm = TRUE). Unlike that approach, however, missing values can be easily replaced with their column means (na.rm = FALSE). Note that this can greatly increase the size of the data to be clustered, which will increase the run time of the function. An example of this was timed on a data frame with about 230K observations and 200 variables where the function took 10 minutes. Future efforts should be made to optimize this or explore alternative approaches.

Usage

1
variable_cluster(x, n, na.rm = TRUE)

Arguments

x

data frame; contains the variables to be clustered

n

integer value >= 2; number of desired clusters

na.rm

logical value; should records with missing values be removed?

Value

A data frame with class "mt_variable_cluster" containing the following columns:

See Also

kmeansvar

Examples

1

dnegrey/miscTools documentation built on May 3, 2019, 2:57 p.m.