R/SHINY-ssdtwclust.R
In dtwclust: Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

Documented in ssdtwclust

#' A shiny app for semi-supervised DTW-based clustering
#'
#' Display a shiny user interface that implements the approach in Dau et al. (2016).
#'
#' @export
#' @importFrom shiny runApp
#' @importFrom shiny shinyApp
#' @importFrom shinyjs useShinyjs
#'
#' @param series Time series in the formats accepted by [compare_clusterings()].
#' @param ... More arguments for [shiny::runApp()].
#' @param complexity A function to calculate a constraint's complexity. See details in the Cluster
#'   section.
#'
#' @details
#'
#' The approach developed in Dau et al. (2016) argues that finding a good value of `window.size` for
#' the DTW distance is very important, and suggests how to find one by using user-provided feedback.
#' After clustering is done, a pair of series is presented at a time, and the user must annotate the
#' pair as:
#'
#' - Must link: the series should be in the same cluster.
#' - Cannot link: the series should *not* be in the same cluster.
#' - Skip: the choice is unclear.
#'
#' After each step, a good value of the window size is suggested by evaluating which clusterings
#' fulfill the constraint(s) so far, and how (see Dau et al. (2016) for more information), and
#' performing a majority vote using the window sizes inferred from each constraint. The (main)
#' procedure is thus interactive and can be abandoned at any point.
#'
#' @section Explore:
#'
#'   This part of the app is simply to see some basic characteristics of the provided series and
#'   plot some of them. The field for integer IDs expects a valid R expression that specifies which
#'   of the `series` should be plotted. Multivariate series are plotted with each variable in a
#'   different facet.
#'
#' @section Cluster:
#'
#'   This part of the app implements the main procedure by leveraging [compare_clusterings()]. The
#'   interface is similar to [interactive_clustering()], so it's worth checking its documentation
#'   too. Since [compare_clusterings()] supports parallelization with [foreach::foreach()], you can
#'   register a parallel backend before opening the shiny app, but you should pre-load the workers
#'   with the necessary packages and/or functions. See [parallel::clusterEvalQ()] and
#'   [parallel::clusterExport()], as well as the examples below.
#'
#'   The range of window sizes is specified with a slider, and represents the size as a percentage
#'   of the shortest series' length. The `step` parameter indicates how spaced apart should the
#'   sizes be (parameter `'by'` in [base::seq()]). A 0-size window should only be used if all series
#'   have the same length. If the series have different lengths, using small window sizes can be
#'   problematic if the length differences are very big, see the notes in [dtw_basic()].
#'
#'   A `window.size` should *not* be specified in the extra parameters, it will be replaced with the
#'   computed values based on the slider. Using [dba()] centroid is detected, and will use the same
#'   window sizes.
#'
#'   For partitional clusterings with many repetitions, and hierarchical clusterings with many
#'   linkage methods, the resulting partitions are aggregated by calling [clue::cl_medoid()] with
#'   the specified aggregation `method`.
#'
#'   By default, complexity of a constraint is calculated differently from what is suggested in Dau
#'   et al. (2016):
#'
#'   - Allocate a logical flag vector with length equal to the number of tested window sizes.
#'   - For each window size, set the corresponding flag to `TRUE` if the constraint given by the
#'   user is fulfilled.
#'   - Calculate complexity as: (number of sign changes in the vector) / (number of window sizes -
#'   1L) / (maximum number of *contiguous* `TRUE` flags).
#'
#'   You can provide your own function in the `complexity` parameter. It will receive the flag
#'   vector as only input, and a single number is expected as a result.
#'
#'   The complexity threshold can be specified in the app. Any constraint whose complexity is higher
#'   than the threshold will not be considered for the majority vote. Constraints with a complexity
#'   of 0 are also ignored. An infinite complexity means that the constraint is never fulfilled by
#'   any clustering.
#'
#' @section Evaluate:
#'
#'   This section provides numerical results for reference. The latest results can be saved in the
#'   global environment, which includes clustering results, constraints so far, and the suggested
#'   window size. Since this includes everything returned by [compare_clusterings()], you could also
#'   use [repeat_clustering()] afterwards.
#'
#'   The constraint plots depict if the constraints are fulfilled or not for the given window sizes,
#'   where 1 means it was fulfilled and 0 means it wasn't. An error about a zero-dimension viewport
#'   indicates the plot height is too small to fit the plots, so please increase the height.
#'
#' @note
#'
#' The optimization mentioned in section 3.4 of Dau et al. (2016) is also implemented here.
#'
#' Tracing is printed to the console.
#'
#' @author Alexis Sarda-Espinosa
#'
#' @references
#'
#' Dau, H. A., Begum, N., & Keogh, E. (2016). Semi-supervision dramatically improves time series
#' clustering under dynamic time warping. In Proceedings of the 25th ACM International on Conference
#' on Information and Knowledge Management (pp. 999-1008). ACM.
#' \url{https://sites.google.com/site/dtwclustering/}
#'
#' @seealso
#'
#' [interactive_clustering()], [compare_clusterings()]
#'
#' @examples
#'
#' \dontrun{
#' require(doParallel)
#' workers <- makeCluster(detectCores())
#' clusterEvalQ(workers, {
#'     library(dtwclust)
#'     RcppParallel::setThreadOptions(1L)
#' })
#' registerDoParallel(workers)
#' ssdtwclust(reinterpolate(CharTraj[1L:20L], 150L))
#' }
#'
ssdtwclust <- function(series, ..., complexity = NULL) {
    series <- tslist(series)
    check_consistency(series, "vltslist")
    is_multivariate(series) # dimension consistency check
    if (!is.null(complexity) && !is.function(complexity))
        stop("A custom complexity should be a function.")

    file_path <- system.file("ssdtwclust", "app.R", package = "dtwclust")
    if (!nzchar(file_path)) stop("Shiny app not found")
    ui <- server <- NULL # avoid NOTE about undefined globals
    source(file_path, local = TRUE, chdir = TRUE)

    server_env <- environment(server) # will see all dtwclust functions
    server_env$.series_ <- series
    server_env$.explore_df_ <- explore__tidy_series(series)
    server_env$complexity <- complexity

    app <- shiny::shinyApp(ui, server)
    shiny::runApp(app, ...)
}

Any scripts or data that you put into this service are public.

dtwclust documentation built on Sept. 11, 2024, 9:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dtwclust
Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

R/SHINY-ssdtwclust.R
In dtwclust: Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

Defines functions ssdtwclust

Documented in ssdtwclust

Try the dtwclust package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

dtwclust Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

R/SHINY-ssdtwclust.R In dtwclust: Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

Defines functions ssdtwclust

Documented in ssdtwclust

Try the dtwclust package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

dtwclust
Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

R/SHINY-ssdtwclust.R
In dtwclust: Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance