DR | R Documentation |
Downhill riding procedure for selecting optimal tuning parameters in clustering algorithms, using an (in)stability probe.
DR(X, method, minPts = 3, theta = 0.9, B = 500, lb = -30, ub = 10)
X |
an |
method |
the clustering method to be used – currently either
“TRUST” \insertCiteCiampi_etal_2010funtimes
or “DBSCAN” \insertCiteEster_etal_1996funtimes. If the method is |
minPts |
the minimum number of samples in an |
theta |
connectivity parameter |
B |
number of random splits in calculating the Average Cluster Deviation (ACD). The default value is 500. |
lb, ub |
endpoints for a range of search for the optimal parameter. |
Parameters lb,ub
are endpoints for the search for the
optimal parameter. The parameter candidates are calculated in a way such that
P:= 1.1^x , x \in {lb,lb+0.5,lb+1.0,...,ub}
.
Although the default range of search is sufficiently wide, in some cases
lb,ub
can be further extended if a warning message is given.
For more discussion on properties of the considered clustering algorithms and the DR procedure see \insertCiteHuang_etal_2016;textualfuntimes and \insertCiteHuang_etal_2018_riding;textualfuntimes.
A list containing the following components:
P_opt |
the value of the optimal parameter. If the method is |
ACD_matrix |
a matrix that returns |
Xin Huang, Yulia R. Gel
BICC
, dbscan
## Not run:
## example 1
## use iris data to test DR procedure
data(iris)
require(clue) # calculate NMI to compare the clustering result with the ground truth
require(scatterplot3d)
Data <- scale(iris[,-5])
ground_truth_label <- iris[,5]
# perform DR procedure to select optimal eps for DBSCAN
# and save it in variable eps_opt
eps_opt <- DR(t(Data), method="DBSCAN", minPts = 5)$P_opt
# apply DBSCAN with the optimal eps on iris data
# and save the clustering result in variable res
res <- dbscan(Data, eps = eps_opt, minPts =5)$cluster
# calculate NMI to compare the clustering result with the ground truth label
clue::cl_agreement(as.cl_partition(ground_truth_label),
as.cl_partition(as.numeric(res)), method = "NMI")
# visualize the clustering result and compare it with the ground truth result
# 3D visualization of clustering result using variables Sepal.Width, Sepal.Length,
# and Petal.Length
scatterplot3d(Data[,-4],color = res)
# 3D visualization of ground truth result using variables Sepal.Width, Sepal.Length,
# and Petal.Length
scatterplot3d(Data[,-4],color = as.numeric(ground_truth_label))
## example 2
## use synthetic time series data to test DR procedure
require(funtimes)
require(clue)
require(zoo)
# simulate 16 time series for 4 clusters, each cluster contains 4 time series
set.seed(114)
samp_Ind <- sample(12,replace=F)
time_points <- 30
X <- matrix(0,nrow=time_points,ncol = 12)
cluster1 <- sapply(1:4,function(x) arima.sim(list(order = c(1, 0, 0), ar = c(0.2)),
n = time_points, mean = 0, sd = 1))
cluster2 <- sapply(1:4,function(x) arima.sim(list(order = c(2 ,0, 0), ar = c(0.1, -0.2)),
n = time_points, mean = 2, sd = 1))
cluster3 <- sapply(1:4,function(x) arima.sim(list(order = c(1, 0, 1), ar = c(0.3), ma = c(0.1)),
n = time_points, mean = 6, sd = 1))
X[,samp_Ind[1:4]] <- t(round(cluster1, 4))
X[,samp_Ind[5:8]] <- t(round(cluster2, 4))
X[,samp_Ind[9:12]] <- t(round(cluster3, 4))
# create ground truth label of the synthetic data
ground_truth_label = matrix(1, nrow = 12, ncol = 1)
for(k in 1:3){
ground_truth_label[samp_Ind[(4*k - 4 + 1):(4*k)]] = k
}
# perform DR procedure to select optimal delta for TRUST
# and save it in variable delta_opt
delta_opt <- DR(X, method = "TRUST")$P_opt
# apply TRUST with the optimal delta on the synthetic data
# and save the clustering result in variable res
res <- CSlideCluster(X, Delta = delta_opt, Theta = 0.9)
# calculate NMI to compare the clustering result with the ground truth label
clue::cl_agreement(as.cl_partition(as.numeric(ground_truth_label)),
as.cl_partition(as.numeric(res)), method = "NMI")
# visualize the clustering result and compare it with the ground truth result
# visualization of the clustering result obtained by TRUST
plot.zoo(X, type = "l", plot.type = "single", col = res, xlab = "Time index", ylab = "")
# visualization of the ground truth result
plot.zoo(X, type = "l", plot.type = "single", col = ground_truth_label,
xlab = "Time index", ylab = "")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.