# DR: Downhill Riding (DR) Procedure In funtimes: Functions for Time Series Analysis

## Description

Downhill riding procedure for selecting optimal tuning parameters in clustering algorithms, using an (in)stability probe.

## Usage

 1 DR(X, method, minPts = 3, theta = 0.9, B = 500, lb = -30, ub = 10) 

## Arguments

 X a n\times k matrix where columns are k objects to be clustered, and each object contains n observations (objects could be a set of time series). method the clustering method to be used – currently either “TRUST” \insertCiteCiampi_etal_2010funtimes or “DBSCAN” \insertCiteEster_etal_1996funtimes. If method is DBSCAN, then set MinPts and optimal ε is selected using DR. If method is TRUST, then set theta and optimal δ is selected using DR. minPts the minimum number of samples in an ε-neighborhood of a point to be considered as a core point. The minPts is to be used only with DBSCAN method. Default value is 3. theta connectivity parameter θ \in (0,1), which is to be used only with TRUST method. Default value is 0.9. B number of random splits in calculating the Average Cluster Deviation (ACD). Default value is 500. lb, ub end points for a range of search for the optimal parameter.

## Details

Parameters lb,ub are end points for a range of search for the optimal parameter. The parameter candidates are calculated in a way such that P:= 1.1^x , x \in {lb,lb+0.5,lb+1.0,...,ub}. Although the default range of search is sufficiently wide, in some cases lb,ub can be further extended if a warning message is given.

For more discussion on properties of the considered clustering algorithms and the DR procedure see \insertCiteHuang_etal_2016;textualfuntimes and \insertCiteHuang_etal_2018_riding;textualfuntimes.

## Value

A list containing the following components:

 P_opt the value of optimal parameter. If method is DBSCAN, then P_opt is optimal ε. If method is TRUST, then P_opt is optimal δ. ACD_matrix a matrix that returns ACD for different values of a tuning parameter. If method is DBSCAN, then the tuning parameter is ε. If method is TRUST, then the tuning parameter is δ.

## Author(s)

Xin Huang, Yulia R. Gel

## References

\insertAllCited

BICC, dbscan
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 ## Not run: ## example 1 ## use iris data to test DR procedure data(iris) require(clue) # calculate NMI to compare the clustering result with the ground truth require(scatterplot3d) Data <- scale(iris[,-5]) ground_truth_label <- iris[,5] # perform DR procedure to select optimal eps for DBSCAN # and save it in variable eps_opt eps_opt <- DR(t(Data), method="DBSCAN", minPts = 5)$P_opt # apply DBSCAN with the optimal eps on iris data # and save the clustering result in variable res res <- dbscan(Data, eps = eps_opt, minPts =5)$cluster # calculate NMI to compare the clustering result with the ground truth label clue::cl_agreement(as.cl_partition(ground_truth_label), as.cl_partition(as.numeric(res)), method = "NMI") # visualize the clustering result and compare it with the ground truth result # 3D visualization of clustering result using variables Sepal.Width, Sepal.Length, # and Petal.Length scatterplot3d(Data[,-4],color = res) # 3D visualization of ground truth result using variables Sepal.Width, Sepal.Length, # and Petal.Length scatterplot3d(Data[,-4],color = as.numeric(ground_truth_label)) ## example 2 ## use synthetic time series data to test DR procedure require(funtimes) require(clue) require(zoo) # simulate 16 time series for 4 clusters, each cluster contains 4 time series set.seed(114) samp_Ind <- sample(12,replace=F) time_points <- 30 X <- matrix(0,nrow=time_points,ncol = 12) cluster1 <- sapply(1:4,function(x) arima.sim(list(order = c(1, 0, 0), ar = c(0.2)), n = time_points, mean = 0, sd = 1)) cluster2 <- sapply(1:4,function(x) arima.sim(list(order = c(2 ,0, 0), ar = c(0.1, -0.2)), n = time_points, mean = 2, sd = 1)) cluster3 <- sapply(1:4,function(x) arima.sim(list(order = c(1, 0, 1), ar = c(0.3), ma = c(0.1)), n = time_points, mean = 6, sd = 1)) X[,samp_Ind[1:4]] <- t(round(cluster1, 4)) X[,samp_Ind[5:8]] <- t(round(cluster2, 4)) X[,samp_Ind[9:12]] <- t(round(cluster3, 4)) # create ground truth label of the synthetic data ground_truth_label = matrix(1, nrow = 12, ncol = 1) for(k in 1:3){ ground_truth_label[samp_Ind[(4*k - 4 + 1):(4*k)]] = k } # perform DR procedure to select optimal delta for TRUST # and save it in variable delta_opt delta_opt <- DR(X, method = "TRUST")\$P_opt # apply TRUST with the optimal delta on the synthetic data # and save the clustering result in variable res res <- CSlideCluster(X, Delta = delta_opt, Theta = 0.9) # calculate NMI to compare the clustering result with the ground truth label clue::cl_agreement(as.cl_partition(as.numeric(ground_truth_label)), as.cl_partition(as.numeric(res)), method = "NMI") # visualize the clustering result and compare it with the ground truth result # visualization of the clustering result obtained by TRUST plot.zoo(X, type = "l", plot.type = "single", col = res, xlab = "Time index", ylab = "") # visualization of the ground truth result plot.zoo(X, type = "l", plot.type = "single", col = ground_truth_label, xlab = "Time index", ylab = "") ## End(Not run)