BICC: BIC-Based Spatio-Temporal Clustering

View source: R/BICC.R

BICCR Documentation

BIC-Based Spatio-Temporal Clustering

Description

Apply the algorithm of unsupervised spatio-temporal clustering, TRUST \insertCiteCiampi_etal_2010funtimes, with automatic selection of its tuning parameters Delta and Epsilon based on Bayesian information criterion, BIC \insertCiteSchaeffer_etal_2016_trustfuntimes.

Usage

BICC(X, Alpha = NULL, Beta = NULL, Theta = 0.8, p, w, s)

Arguments

X

a matrix of time series observed within a slide (time series in columns).

Alpha

lower limit of the time-series domain, passed to CSlideCluster.

Beta

upper limit of the time-series domain passed to CSlideCluster.

Theta

connectivity parameter passed to CSlideCluster.

p

number of layers (time-series observations) in each slide.

w

number of slides in each window.

s

step to shift a window, calculated in the number of slides. The recommended values are 1 (overlapping windows) or equal to w (non-overlapping windows).

Details

This is the upper-level function for time series clustering. It exploits the functions CWindowCluster and CSlideCluster to cluster time series based on closeness and homogeneity measures. Clustering is performed multiple times with a range of equidistant values for the parameters Delta and Epsilon, then optimal parameters Delta and Epsilon along with the corresponding clustering results are shown \insertCite@see @Schaeffer_etal_2016_trust, for more detailsfuntimes.

The total length of time series (number of levels, i.e., nrow(X)) should be divisible by p.

Value

A list with the following elements:

delta.opt

optimal value for the clustering parameter Delta.

epsilon.opt

optimal value for the clustering parameter Epsilon.

clusters

vector of length ncol(X) with cluster labels.

IC

values of the information criterion (BIC) for each considered combination of Delta (rows) and Epsilon (columns).

delta.all

vector of considered values for Delta.

epsilon.all

vector of considered values for Epsilon.

Author(s)

Ethan Schaeffer, Vyacheslav Lyubchich

References

\insertAllCited

See Also

CSlideCluster, CWindowCluster, purity

Examples

# Fix seed for reproducible simulations:
set.seed(1)

##### Example 1
# Similar to Schaeffer et al. (2016), simulate 3 years of monthly data 
#for 10 locations and apply clustering:
# 1.1 Simulation
T <- 36 #total months
N <- 10 #locations
phi <- c(0.5) #parameter of autoregression
burn <- 300 #burn-in period for simulations
X <- sapply(1:N, function(x) 
    arima.sim(n = T + burn, 
              list(order = c(length(phi), 0, 0), ar = phi)))[(burn + 1):(T + burn),]
colnames(X) <- paste("TS", c(1:dim(X)[2]), sep = "")

# 1.2 Clustering
# Assume that information arrives in year-long slides or data chunks
p <- 12 #number of time layers (months) in a slide
# Let the upper level of clustering (window) be the whole period of 3 years, so
w <- 3 #number of slides in a window
s <- w #step to shift a window, but it does not matter much here as we have only one window of data
tmp <- BICC(X, p = p, w = w, s = s)

# 1.3 Evaluate clustering
# In these simulations, it is known that all time series belong to one class,
#since they were all simulated the same way:
classes <- rep(1, 10)
# Use the information on the classes to calculate clustering purity:
purity(classes, tmp$clusters[1,])

##### Example 2
# 2.1 Modify time series and update classes accordingly:
# Add a mean shift to a half of the time series:
X2 <- X
X2[, 1:(N/2)] <- X2[, 1:(N/2)] + 3
classes2 <- rep(c(1, 2), each = N/2)

# 2.2 Re-apply clustering procedure and evaluate clustering purity:
tmp2 <- BICC(X2, p = p, w = w, s = s)
tmp2$clusters
purity(classes2, tmp2$clusters[1,])


funtimes documentation built on March 31, 2023, 7:35 p.m.