Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/discretize_jointly.R
Discretize multivariate continuous data using a grid that captures the joint distribution via preserving clusters in the original data
1 | discretize.jointly(data, k = c(2:10), cluster_label = NULL, min_level = 2)
|
data |
a matrix containing two or more continuous variables. Columns are variables, rows are observations. |
k |
either the number or range of clusters to be found on |
cluster_label |
a vector of user-specified cluster labels for each observation
in |
min_level |
the minimum number of levels along each dimension |
The function implements algorithms described in \insertCiteJwang2020BCBGridOnClusters.
A list that contains four items:
|
a matrix that contains the discretized version of the original |
|
a list of vectors containing decision boundaries for each variable/dimension. |
|
a vector containing cluster labels for each observation in |
|
a similarity score between clusters from joint discretization
|
Jiandong Wang, Sajal Kumar and Mingzhou Song
See Ckmeans.1d.dp for discretizing univariate continuous data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | # using a specified k
x = rnorm(100)
y = sin(x)
z = cos(x)
data = cbind(x, y, z)
discretized_data = discretize.jointly(data, k=5)$D
# using a range of k
x = rnorm(1000)
y = log1p(abs(x))
z = tan(x)
data = cbind(x, y, z)
discretized_data = discretize.jointly(data, k=c(3:10))$D
# using an alternate clustering method to k-means
library(cluster)
x = rnorm(1000)
y = log1p(abs(x))
z = sin(x)
data = cbind(x, y, z)
# pre-cluster the data using partition around medoids (PAM)
cluster_label = pam(x=data, diss = FALSE, metric = "euclidean", k = 5)$clustering
discretized_data = discretize.jointly(data, cluster_label = cluster_label)$D
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.