Description Usage Arguments Details Value See Also Examples
View source: R/misc_functions.R
Weighted k-means for mixed continuous and categorical variables. A
user-specified weight conWeight
controls the relative contribution of the
variable types to the cluster solution.
1 |
conData |
The continuous variables. Must be coercible to a data frame. |
catData |
The categorical variables, either as factors or dummy-coded variables. Must be coercible to a data frame. |
conWeight |
The continuous weight; must be between 0 and 1. The categorical weight is |
nclust |
The number of clusters. |
... |
Optional arguments passed to |
A simple adaptation of stats::kmeans
to mixed-type data. Continuous
variables are multiplied by the input parameter conWeight
, and categorical
variables are multipled by 1-conWeight
. If factor variables are input to
catData
, they are transformed to 0-1 dummy coded variables with the function
dummyCodeFactorDf
.
A stats::kmeans results object, with additional slots conCenters
and catCenters
giving the actual centers adjusted for the weighting process.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # Generate toy data set with poor quality categorical variables and good
# quality continuous variables.
set.seed(1)
dat <- genMixedData(200, nConVar=2, nCatVar=2, nCatLevels=4, nConWithErr=2,
nCatWithErr=2, popProportions=c(.5,.5), conErrLev=0.3, catErrLev=0.8)
catDf <- data.frame(apply(dat$catVars, 2, factor), stringsAsFactors = TRUE)
conDf <- data.frame(scale(dat$conVars), stringsAsFactors = TRUE)
# A clustering that emphasizes the continuous variables
r1 <- with(dat,wkmeans(conDf, catDf, 0.9, 2))
table(r1$cluster, dat$trueID)
# A clustering that emphasizes the categorical variables; note argument
# passed to the underlying stats::kmeans function
r2 <- with(dat,wkmeans(conDf, catDf, 0.1, 2, nstart=4))
table(r2$cluster, dat$trueID)
|
1 2
1 92 7
2 10 91
1 2
1 80 64
2 22 34
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.