BICC: BIC-Based Spatio-Temporal Clustering
In funtimes: Functions for Time Series Analysis

View source: R/BICC.R

BICC	R Documentation

BIC-Based Spatio-Temporal Clustering

Description

Apply the algorithm of unsupervised spatio-temporal clustering, TRUST \insertCiteCiampi_etal_2010funtimes, with automatic selection of its tuning parameters Delta and Epsilon based on Bayesian information criterion, BIC \insertCiteSchaeffer_etal_2016_trustfuntimes.

Usage

BICC(X, Alpha = NULL, Beta = NULL, Theta = 0.8, p, w, s)

Arguments

`X`	a matrix of time series observed within a slide (time series in columns).
`Alpha`	lower limit of the time-series domain, passed to `CSlideCluster`.
`Beta`	upper limit of the time-series domain passed to `CSlideCluster`.
`Theta`	connectivity parameter passed to `CSlideCluster`.
`p`	number of layers (time-series observations) in each slide.
`w`	number of slides in each window.
`s`	step to shift a window, calculated in the number of slides. The recommended values are 1 (overlapping windows) or equal to `w` (non-overlapping windows).

Details

This is the upper-level function for time series clustering. It exploits the functions CWindowCluster and CSlideCluster to cluster time series based on closeness and homogeneity measures. Clustering is performed multiple times with a range of equidistant values for the parameters Delta and Epsilon, then optimal parameters Delta and Epsilon along with the corresponding clustering results are shown \insertCite@see @Schaeffer_etal_2016_trust, for more detailsfuntimes.

The total length of time series (number of levels, i.e., nrow(X)) should be divisible by p.

Value

A list with the following elements:

`delta.opt`	optimal value for the clustering parameter `Delta`.
`epsilon.opt`	optimal value for the clustering parameter `Epsilon`.
`clusters`	vector of length `ncol(X)` with cluster labels.
`IC`	values of the information criterion (BIC) for each considered combination of `Delta` (rows) and `Epsilon` (columns).
`delta.all`	vector of considered values for `Delta`.
`epsilon.all`	vector of considered values for `Epsilon`.

Author(s)

Ethan Schaeffer, Vyacheslav Lyubchich

References

\insertAllCited

Examples

# Fix seed for reproducible simulations:
set.seed(1)

##### Example 1
# Similar to Schaeffer et al. (2016), simulate 3 years of monthly data 
#for 10 locations and apply clustering:
# 1.1 Simulation
T <- 36 #total months
N <- 10 #locations
phi <- c(0.5) #parameter of autoregression
burn <- 300 #burn-in period for simulations
X <- sapply(1:N, function(x) 
    arima.sim(n = T + burn, 
              list(order = c(length(phi), 0, 0), ar = phi)))[(burn + 1):(T + burn),]
colnames(X) <- paste("TS", c(1:dim(X)[2]), sep = "")

# 1.2 Clustering
# Assume that information arrives in year-long slides or data chunks
p <- 12 #number of time layers (months) in a slide
# Let the upper level of clustering (window) be the whole period of 3 years, so
w <- 3 #number of slides in a window
s <- w #step to shift a window, but it does not matter much here as we have only one window of data
tmp <- BICC(X, p = p, w = w, s = s)

# 1.3 Evaluate clustering
# In these simulations, it is known that all time series belong to one class,
#since they were all simulated the same way:
classes <- rep(1, 10)
# Use the information on the classes to calculate clustering purity:
purity(classes, tmp$clusters[1,])

##### Example 2
# 2.1 Modify time series and update classes accordingly:
# Add a mean shift to a half of the time series:
X2 <- X
X2[, 1:(N/2)] <- X2[, 1:(N/2)] + 3
classes2 <- rep(c(1, 2), each = N/2)

# 2.2 Re-apply clustering procedure and evaluate clustering purity:
tmp2 <- BICC(X2, p = p, w = w, s = s)
tmp2$clusters
purity(classes2, tmp2$clusters[1,])

funtimes documentation built on March 31, 2023, 7:35 p.m.