Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/discretize_jointly.R
Discretize multivariate continuous data using a grid that captures the joint distribution via preserving clusters in the original data
1 2 3 4 5 6 7 8 |
data |
a matrix containing two or more continuous variables. Columns are variables, rows are observations. |
k |
either an integer, a vector of integers, or |
min_level |
integer or vector, signifying the minimum number of levels
along each dimension. If a vector of size |
cluster_method |
the clustering method to be used. Ignored if cluster labels
are given
"kmeans+silhouette" will use k-means to cluster |
grid_method |
the discretization method to be used. "Sort+split" will sort the cluster by cluster mean in each dimension and then split consecutive pairs only if the sum of the error rate of each cluster is less than or equal to 50 in a certain dimension. The maximum number of lines is the number of clusters minus one. "MultiChannel.WUC" will split each dimension by weighted with-in cluster sum of squared distances by "Ckmeans.1d.dp::MultiChannel.WUC". Applied in each projection on each dimension. The channel of each point is defined by its multivariate cluster label. |
cluster_label |
a vector of user-specified cluster labels for each observation
in |
The function implements algorithms described in \insertCiteJwang2020BCBGridOnClusters.
A list that contains four items:
|
a matrix that contains the discretized version of the original |
|
a list of vectors containing decision boundaries for each variable/dimension. |
|
a vector containing cluster labels for each observation in |
|
a similarity score between clusters from joint discretization
|
Jiandong Wang, Sajal Kumar and Mingzhou Song
See Ckmeans.1d.dp for discretizing univariate continuous data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | # using a specified k
x = rnorm(100)
y = sin(x)
z = cos(x)
data = cbind(x, y, z)
discretized_data = discretize.jointly(data, k=5)$D
# using a range of k
x = rnorm(100)
y = log1p(abs(x))
z = tan(x)
data = cbind(x, y, z)
discretized_data = discretize.jointly(data, k=c(3:10))$D
# using k = Inf
x = c()
y = c()
mns = seq(0,1200,100)
for(i in 1:12){
x = c(x,runif(n=20, min=mns[i], max=mns[i]+20))
y = c(y,runif(n=20, min=mns[i], max=mns[i]+20))
}
data = cbind(x, y)
discretized_data = discretize.jointly(data, k=Inf)$D
# using an alternate clustering method to k-means
library(cluster)
x = rnorm(100)
y = log1p(abs(x))
z = sin(x)
data = cbind(x, y, z)
# pre-cluster the data using partition around medoids (PAM)
cluster_label = pam(x=data, diss = FALSE, metric = "euclidean", k = 5)$clustering
discretized_data = discretize.jointly(data, cluster_label = cluster_label)$D
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.