Clustering time series using funtimes package" In funtimes: Functions for Time Series Analysis

```knitr::opts_chunk\$set(collapse = TRUE, comment = "#",
fig.width = 7,
fig.height = 6)
```

Introduction

In this tutorial, two unsupervised clustering algorithms from the `funtimes` package are used to identify clusters of Australia's sea level time series.

First, load the essential libraries for the analysis:

```library(funtimes)
library(ggplot2)
library(gridExtra)
library(reshape2)
```

Data

The daily sea level data are available from 1993 to 2012 for 17 locations. The data are obtained from @Maharaj_etal_2019 using the following link http://www.tsclustering.homepage.pt/index.php?p=3. Download `Application7_3.zip` folder, where the `Aus Sea Levels 17.xlsx` file contains the sea level records. Annual average is taken to convert the temporal resolution.

```d_org <- readxl::read_xlsx("Aus_Sea_Levels_17.xlsx", skip = 1, n_max = 7300)
# yearly average
d <- data.frame(aggregate(d_org[, 4:20], list(d_org\$Year),
FUN = 'mean', na.rm = TRUE)[, -1],
row.names = unique(d_org\$Year))
```
```# saveRDS(d, "Aus_Sea_Levels_17.rds")
```

Plotting time series

Below is the plot of annual time series of sea level for 17 locations:

```dlong <- reshape2::melt(t(d))
names(dlong)[1:2] <- c("Location", "Year")
ggplot(dlong) + geom_line(aes(x = Year, y = value, color = Location), size = 1) +
ylab('Sea level (m)') +
theme_bw()
```

This plot demonstrates the variation in the sea levels across the locations. It can be seen that not all the time series are having a common trend since 1993. Grouping the locations with a common trend could benefit Australian government to assess and implement climate adaptation strategies for the impact of sea level rise on clustered locations.

Clustering time series based on trend synchronism

The first function from the package to test is the `sync_cluster` that groups the time series with the common linear trend. The window parameter `w` is set here for number of slides in each window. If the number of years are not enough in the time series, this parameter is required to be set.

```set.seed(123)
Clus_sync <- sync_cluster(d ~ t, Window = 3, B = 100)
Clus_sync
```

Total `r sum(Clus_sync\$cluster != 0)` locations are clustered with a common linear trend, while the remaining `r sum(Clus_sync\$cluster == 0)` are not tied to any other location and form so-called noise cluster.

Below is the plot of the clustered time series of sea level, where `Cluster 0` indicates the noise cluster without any common linear trend, while `Cluster 1` shows the time series of locations with a common linear trend:

```for (i in 0:max(Clus_sync\$cluster)) {
assign(paste('py', i, sep = ''),
ggplot(melt(t(d[, Clus_sync\$cluster == i]))) +
geom_line(aes(x = Var2,y = value,color = Var1),size = 1) +
ylab('Sea level (m)') + xlab('Year') +
theme_bw() + ggtitle(paste('Cluster',i)) +
theme(axis.text = element_text(size = 13), axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15), legend.text = element_text(size = 10),
legend.title = element_blank(), legend.key.size = unit(0.3, "cm")))
}
grid.arrange(py0, py1)
```

Clustering time series using a spatiotemporal approach

The `BICC` function applies an unsupervised spatiotemporal clustering algorithm, TRUST, from @Ciampi_etal_2010. The algorithm has a few tuning parameters, and the `BICC` function automatically selects two of those (`Delta` and `Epsilon`; for manual setting of all the parameters, use the lower-level functions `CSlideCluster` and `CWindowCluster`). First, the time series are clustered within small slides; the length of the slides is defined with the parameter `p` (i.e., number of time-series observations in each slide). Then, slides are aggregated into windows (each window contains `w` consecutive slides), and slide-level cluster assignments are used to cluster the time series at the window level. When defining the windows, the user can also set the step `s`, which is the number of steps used to shift the window (if `s = w`, the windows do not overlap).

```Clus_BICC <- BICC(as.matrix(d), p = 5, w = 4, s = 4)
Clus_BICC
```

The algorithm detected only one cluster.

Citation {-}

This vignette belongs to R package `funtimes`. If you wish to cite this page, please cite the package:

```citation("funtimes")
```

References

Try the funtimes package in your browser

Any scripts or data that you put into this service are public.

funtimes documentation built on March 31, 2023, 7:35 p.m.