knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
adept
package implements ADaptive Empirical Pattern Transformation (ADEPT) ([1]) method for pattern segmentation from a time-series x
. ADEPT was designed for optimal use in performing fast, accurate walking strides segmentation from high-density data collected from a wearable accelerometer worn during continuous walking activity.
This vignette intends to introduce a reader to the ADEPT method and demonstrate the usage of the segmentPattern
function which implements ADEPT method. Here, we focus on illustrating segmentPattern
functionality via simulated data examples; see the Walking strides segmentation with adept
" vignette ([2]) for an example of walking stride segmentation from subsecond accelerometry data with adept
package.
ADEPT identifies pattern occurrences from a time-series x
via maximizing similarity (correlation, covariance etc.) between time-series x
windows and a pattern template(s). It accounts for a possible presence of variation in both (1) pattern occurrence duration time and (2) shape over time:
Consider an example in which a pattern occurrence is changing, possibly multiple times, its duration time within a time-series x
. To address such scenario, ADEPT considers various scales of a pattern template(s) to allow for obtaining the higher similarity between a time-series x
and a template.
Consider an example in which an event we aim to segment takes more than one distinct pattern shape within time-series x
. To address such scenario, multiple pattern templates are allowed simultaneously in the ADEPT method.
Here, a pattern template is thought of as a numeric vector that represents the pattern of interest. In practice, a pattern template may be derived based on some pre-segmented small part of the data.
In this vignette, for the purpose of the presentation, a pattern template(s) will be a simulated data vector(s).
adept
packageIf not installed yet, install adept
package from GitHub (devtools
package needs to be installed to do it).
# install.packages("devtools") ## for installing packages from GitHub repository devtools::install_github("martakarass/adept")
Load adept
.
library(adept) library(magrittr) library(ggplot2)
adept
packageThe examples below are organized into suites.
segmentPattern
functionality, including effect of the function's parameter usage, for example: Example 1(b): simple segmentation, modify pattern.dur.seq
argument. Simulate data:
x
,101
).## Generate signal and pattern template x0 <- cos(seq(0, 2 * pi * 10, length.out = 1001)) x <- x0 template <- x0[1:101] ## Plot time-series `x` data.frame(x = 1:length(x), y = x) %>% ggplot() + geom_line(aes(x = x, y = y)) + theme_bw(base_size = 9) + labs(x = "Time [s]", y = "Value", title = "Time-series x")
Plot pattern template. We shall use template
object in the segmentation algorithm.
data.frame(x = seq(0, 1, length.out = 101), y = template) %>% ggplot() + geom_line(aes(x = x, y = y), color = "red") + theme_bw(base_size = 9) + labs(x = "Template phase", y = "Value", title = "Pattern template")
Use segmentPattern
function to identify pattern occurrences within a time-series x
.
pattern.dur.seq <- c(90, 100, 110) out <- segmentPattern(x = x, x.fs = 1, template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cor", compute.template.idx = TRUE) out
Each row of the output data frame (here: out
object above) summarizes information about one identified pattern occurrence:
tau_i
- pattern occurrence start, expressed as an index of a time-series x
,T_i
- pattern occurrence duration, expressed in a time-series x
vector length,sim_i
- value of similarity statistic (here: correlation) between a pattern template and corresponding window of a time-series x
(see Details in segmentPattern
documentation),template_i
- index of a pattern template corresponding to an identified pattern occurrence, or NA
if compute.template.idx = FALSE
is used (see Details in segmentPattern
documentation); here: we provided only one distinct pattern template, hence all template_i
's are equal 1. pattern.dur.seq
argument to modify a grid of pattern durationSimilarly, we aim to 3identify pattern occurrences within a time-series x
, but this time we include the true pattern duration time 101
in a grid of considered pattern durations (pattern.dur.seq
).
We shall see that sim_i
- a value of similarity statistic (here: correlation) - is now equal to 1
for each identified pattern occurrence. In other words, a perfect match between a time-series x
and a template was obtained. It is possible because:
x
has no noise,template
),101
in a grid of considered pattern durations (pattern.dur.seq
). pattern.dur.seq <- 90:110 ## assume dense pattern duration grid ranging 90-110 seconds out <- segmentPattern(x = x, x.fs = 1, ## assume data frequency 1 observation per second template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cor", compute.template.idx = TRUE) out
x.fs
argument to modify assumed frequency at which a time-series x
is collectedThe below example demonstrates usage of x.fs
parameter. We shall see that if we:
x.fs
- assumed frequency at which a time-series x
is collected, expressed in a number of observations per second,pattern.dur.seq
- assumed grid of pattern duration times used in segmentation, expressed in seconds, then the segmentation results stay the same as in Example 1(b).
pattern.dur.seq <- 90:110 * 0.01 ## assume pattern duration grid ranging 0.9-1.1 seconds out <- segmentPattern(x = x, x.fs = 100, ## assume data frequency 100 observations per second template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cor", compute.template.idx = TRUE) out
Simulate data:
x
,60
and 120
.## Generate signal and pattern template ## Grid of true pattern occurrence durations set.seed(1) s.grid <- sample(60:120, size = 10) ## True pattern assumed in data generation process true.pattern <- cos(seq(0, 2 * pi, length.out = 200)) ## Generate a time-series x that consists of "glued" pattern occurrences ## of different length x <- numeric() for (s.tmp in s.grid){ true.pattern.s <- approx(seq(0, 1, length.out = 200), true.pattern, xout = seq(0, 1, length.out = s.tmp))$y if (length(x) > 0){ x <- c(x, true.pattern.s[-1]) } else { x <- c(x, true.pattern.s) } } ## Define template template <- true.pattern ## Plot time-series `x` data.frame(x = 1:length(x), y = x) %>% ggplot() + geom_line(aes(x = x, y = y)) + theme_bw(base_size = 9) + labs(x = "Time [s]", y = "Value", title = "Time-series x")
Plot pattern template (it is the same as in Examples 1).
data.frame(x = seq(0, 1, length.out = length(template)), y = template) %>% ggplot() + geom_line(aes(x = x, y = y), color = "red") + theme_bw(base_size = 9) + labs(x = "Template phase", y = "Value", title = "Pattern template")
## Function to plot segmentation results in a fancy way with ggplot2 library(ggplot2) out.plot1 <- function(val, out){ yrange <- c(-1, 1) * max(abs(val)) y.h <- 0 plt <- ggplot() for (i in 1:nrow(out)){ tau1_i <- out[i, "tau_i"] tau2_i <- tau1_i + out[i, "T_i"] - 1 plt <- plt + geom_vline(xintercept = tau1_i, color = "red") + geom_vline(xintercept = tau2_i, color = "red") + annotate( "rect", fill = "pink", alpha = 0.3, xmin = tau1_i, xmax = tau2_i, ymin = yrange[1], ymax = yrange[2] ) } plt <- plt + geom_line(data = data.frame(x = 1:length(val), y = val), aes(x = x, y = y), color = "black", size = 0.3) + theme_bw(base_size = 9) + labs(x = "Time [s]", y = "Black line: x", title = "Black line: signal x\nRed vertical lines: start and end points of identified pattern occurrence\nRed shaded area: area corresponding to identified pattern occurrence") plot(plt) }
By using a dense grid of duration of pattern occurrences (pattern.dur.seq
), we shall obtain a perfect match between time-series x
and a template rescaled to various scale parameters. Hence, in the segmentation results plot below, one can see that the start and end points of an individual identified pattern occurrence, marked with red vertical lines, do overlap.
pattern.dur.seq <- 60:120 out <- segmentPattern(x = x, x.fs = 1, template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cor", compute.template.idx = TRUE) out ## Plot segmetation results out.plot1(x, out)
In this example we employ a less dense grid of assumed duration of pattern occurrences (pattern.dur.seq
). As a result, we no longer see a perfect match between a time-series x
and a template; we shall see that sim_i
values are not all equal to 1
and we do see a lack of overlap between the start and end points of an individual identified pattern occurrence, as marked with red vertical lines.
pattern.dur.seq
grid of assumed duration of pattern occurrence in segmentation is a shorter computational time. pattern.dur.seq <- c(60, 90, 120) out <- segmentPattern(x = x, x.fs = 1, template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cor", compute.template.idx = TRUE) out
out.plot1(x, out)
similarity.measure
argument to use covariance as similarity statisticSimilar as above, but here we use covariance as a similarity measure statistic. We shall observe that sim_i
values in the result data frame change and the segmentation results change slightly (explanation: a change of similarity statistic values may affect ADEPT iterative maximization procedure results).
pattern.dur.seq <- c(60, 90, 120) out <- segmentPattern(x = x, x.fs = 1, template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cov", ## use covariance as a similarity measure statistic compute.template.idx = TRUE) out
out.plot1(x, out)
Simulate data:
x
,60
and 120
),x
. # Grid of different true pattern occurrence durations set.seed(1) s.grid <- sample(60:120, size = 5) true.pattern.1 <- cos(seq(0, 2 * pi, length.out = 200)) true.pattern.2 <- true.pattern.1 seq.tmp <- 70:130 true.pattern.2[seq.tmp] <- 2 * true.pattern.2[min(seq.tmp)] + abs(true.pattern.2[seq.tmp]) ## Generate signal x that consists of "glued" pattern occurrences of different length x <- numeric() for (ss in s.grid){ ## Add a part from from true.pattern.1 true.pattern.1.s <- approx(seq(0, 1, length.out = 200), true.pattern.1, xout = seq(0, 1, length.out = ss))$y if (length(x) > 0){ x <- c(x, true.pattern.1.s[-1]) } else { x <- c(x, true.pattern.1.s) } ## Add a part from from true.pattern.2 true.pattern.2.s <- approx(seq(0, 1, length.out = 200), true.pattern.2, xout = seq(0, 1, length.out = ss))$y x <- c(x, true.pattern.2.s[-1]) } ## Plot time-series `x` data.frame(x = 1:length(x), y = x) %>% ggplot() + geom_line(aes(x = x, y = y)) + theme_bw(base_size = 9) + labs(x = "Time [s]", y = "Value", title = "Time-series x")
Plot two vectors that were used in the data generation process as the true patterns.
plt1 <- data.frame(x = seq(0, 1, length.out = length(true.pattern.1)), y = true.pattern.1) %>% ggplot() + geom_line(aes(x = x, y = y), color = "red") + theme_bw(base_size = 9) + labs(x = "Template phase", y = "Value", title = "Pattern template 1") + scale_y_continuous(limits = c(-1,1)) plt2 <- data.frame(x = seq(0, 1, length.out = length(true.pattern.2)), y = true.pattern.2) %>% ggplot() + geom_line(aes(x = x, y = y), color = "red") + theme_bw(base_size = 9) + labs(x = "Template phase", y = "Value", title = "Pattern template 2") + scale_y_continuous(limits = c(-1,1)) plt1;plt2
We use a dense grid of duration of pattern occurrences (pattern.dur.seq
). We firstly use a template consisting of only one "true" pattern. We shall see that:
sim_i
equal to 1
; these pattern occurrences were generated with the use of a pattern template true.pattern.1
we supplied to the segmentation algorithm, sim_i
smaller than 1
; these pattern occurrences were generated with the use of a pattern template true.pattern.2
we did not supply to the segmentation algorithm. pattern.dur.seq <- 60:120 template <- true.pattern.1 out <- segmentPattern(x = x, x.fs = 1, template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cor", compute.template.idx = TRUE) out ## Plot segmetation results out.plot1(x, out)
similarity.measure.thresh
argument to set a threshold of minimal similarity valueSimilar as above, but here we additionally use a 0.95
threshold of minimal similarity value (here: covariance) between a time-series x
and pattern template below which the algorithm does not identify a pattern occurrence. Default is 0
. Consequently we shall see that only the pattern occurrences for which sim_i
value is greater than 0.95
are identified.
pattern.dur.seq <- 60:120 template <- true.pattern.1 out <- segmentPattern(x = x, x.fs = 1, template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cor", similarity.measure.thresh = 0.95, compute.template.idx = TRUE) out ## Plot segmetation results out.plot1(x, out)
Similarly as above, but now we use two distinct pattern templates in the template
argument. As expected, now we see that every pattern occurrence identified has corresponding correlation sim_i
equal to 1
. It is because these pattern occurrences were generated with the use of either a pattern template true.pattern.1
or true.pattern.2
, and we supplied both of them to the segmentation algorithm.
The column template_i
in a returned data frame shall show numbers 1
and 2
interchangeably; the number is an indicator of pattern template that yielded particular identified pattern occurrence (see segmentPattern
documentation manual for details).
pattern.dur.seq <- 60:120 template <- list(true.pattern.1, true.pattern.2) out <- segmentPattern(x = x, x.fs = 1, template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cor", compute.template.idx = TRUE) out ## Plot segmetation results out.plot1(x, out)
Simulate data:
x
(noise added is supposed to "hide" the pattern occurrences),60
and 120
),x
. Here, we re-use signal x
from data generated above in Examples 3.
## Add noise to a signal `x` from data generated in *Examples 3*. set.seed(1) x2 <- x + rnorm(length(x), sd = 0.5) ## Plot time-series `x` data.frame(x = 1:length(x2), y = x2) %>% ggplot() + geom_line(aes(x = x, y = y), size = 0.3) + theme_bw(base_size = 9) + labs(x = "Time [s]", y = "Value", title = "Time-series x")
We use two distinct pattern templates, as used in data generation.
pattern.dur.seq <- 60:120 template <- list(true.pattern.1, true.pattern.2) out <- segmentPattern(x = x2, x.fs = 1, template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cor", compute.template.idx = TRUE) out ## Plot segmetation results out.plot1(x2, out)
x.adept.ma.W
argument to smooth time-series x
for similarity matrix computation purposeOne may consider using a smoothed version of a time-series x
for the purposes of computing similarity between a time-series x
and pattern template(s). To do this, the argument x.adept.ma.W
is used to define a length of a window used in moving average smoothing of a time-series x
; W
is expressed in seconds. The default is NULL
(no smoothing applied).
Smoothing of a time-series x
To get a sense what W
should be used as a length of a window used in moving average smoothing, one may use windowSmooth
function to experiment with different values.
W = 10
seems like a plausible choice, that is, the smoothed signal seems to be stripped out of much of a noise but still preserve the underlying pattern. x2.smoothed <- windowSmooth(x = x2, x.fs = 1, W = 10) data.frame(x = 1:length(x2.smoothed), y = x2.smoothed) %>% ggplot() + geom_line(aes(x = x, y = y)) + theme_bw(base_size = 9) + labs(x = "Time [s]", y = "Value", title = "Time-series x smoothed")
We shall see that compared to the Example 4(a), the effect of using a smoothed version of a time-series x
in similarity matrix computation is pronounced in sim_i
values in the resulted data frame as well as in a slight change in tau_i
and T_i
values.
out <- segmentPattern(x = x2, x.fs = 1, template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cor", x.adept.ma.W = 10, compute.template.idx = TRUE) out ## Plot segmetation results out.plot1(x2, out)
finetune
, finetune.maxima.nbh.W
arguments to employ "maxima" fine-tune procedureWe continue to use the segmentation setting similar to the one we used above in Example 4(b), but this time we employ a fine-tuning procedure of stride locations.
"maxima" fine-tune procedure
"maxima" fine-tune procedure tunes preliminarily identified locations of pattern occurrence beginning and end so as they correspond to local maxima of time-series x
(or smoothed version of x
, as we shall see later) found within neighbourhoods of preliminary locations.
The parameter finetune.maxima.nbh.W
defines a length of the two neighborhoods centered at preliminarily identified pattern occurrence beginning and end points within which we search for local maxima. (see segmentPattern
documentation for more details).
We shall see from the plot of segmentation results below that almost all identified pattern occurrence start / end points are hitting the time point our eyes identify as "local signal maxima".
600
; that comes from the restriction we imposed on the pattern duration times by setting pattern.dur.seq <- 60:120
, that is, the pattern occurrence cannot be longer than 120
. out <- segmentPattern(x = x2, x.fs = 1, template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cor", x.adept.ma.W = 10, finetune = "maxima", finetune.maxima.nbh.W = 30, compute.template.idx = TRUE) out ## Plot segmetation results out.plot1(x2, out)
finetune
, finetune.maxima.ma.W
, finetune.maxima.nbh.W
arguments to employ "maxima" fine-tune procedure and smooth signal for peak detectionOne other step that can be made in "maxima" fine-tune procedure is to use a smoothed version of a signal for peak detection. Here, we smooth a time-series x
for both similarity matrix computation procedure and for "maxima" fine-tune procedure.
We would typically choose a more aggressive smoothing for a "maxima" fine-tune procedure (as defined via smoothing window length finetune.maxima.ma.W
) than for similarity matrix computation (as defined via smoothing window length x.adept.ma.W
); the reason is that often an aggressive smoothing is needed to remove ("smooth together") multiple local maxima of a time-series x
.
We again use windowSmooth
function to experiment with different values of a window length in moving average smoothing.
W = 50
seems like a plausible choice as it removes ("smooth together") multiple local maxima of a time-series x
, leaving out a single one. x2.smoothed <- windowSmooth(x = x2, x.fs = 1, W = 50) data.frame(x = 1:length(x2.smoothed), y = x2.smoothed) %>% ggplot() + geom_line(aes(x = x, y = y)) + theme_bw(base_size = 9) + labs(x = "Time [s]", y = "Value", title = "Time-series x smoothed aggresively")
pattern.dur.seq <- 60:120 template <- list(true.pattern.1, true.pattern.2) out <- segmentPattern(x = x2, x.fs = 1, template = template, pattern.dur.seq = pattern.dur.seq, similarity.measure = "cor", x.adept.ma.W = 10, ## smoothing parameter for similarity matrix computation finetune = "maxima", ## use fine-tuning finetune.maxima.ma.W = 50, ## smoothing parameter for peak detection in fine-tuning finetune.maxima.nbh.W = 30, ## neighborhoods length in fine-tuning compute.template.idx = TRUE) out
The plot below shows the segmentation results
## Function to plot nice results visualization out.plot2 <- function(val, val.sm, out){ yrange <- c(-1, 1) * max(abs(val)) y.h <- 0 plt <- ggplot() + geom_line(data = data.frame(x = 1:length(val), y = val), aes(x = x, y = y), color = "grey") for (i in 1:nrow(out)){ tau1_i <- out[i, "tau_i"] tau2_i <- tau1_i + out[i, "T_i"] - 1 plt <- plt + geom_vline(xintercept = tau1_i, color = "red") + geom_vline(xintercept = tau2_i, color = "red") + annotate( "rect", fill = "pink", alpha = 0.3, xmin = tau1_i, xmax = tau2_i, ymin = yrange[1], ymax = yrange[2] ) } plt <- plt + geom_line(data = data.frame(x = 1:length(val.sm), y = val.sm), aes(x = x, y = y), color = "black", size = 0.6, alpha = 0.8) + theme_bw(base_size = 9) + labs(x = "Time [s]", y = "Black line: smoothed x", title ="Light gray line: signal x\nBlack line: smoothed signal x\nRed vertical lines: start and end points of identified pattern occurrence\nRed shaded area: area corresponding to identified pattern occurrence") plot(plt) }
## Plot segmetation results out.plot2(x2, windowSmooth(x = x2, x.fs = 1, W = 50), out)
References:
1: Karas, M., Straczkiewicz, M., Fadel, W., Harezlak, J., Crainiceanu, C., Urbanek, J.K. Adaptive empirical pattern transformation (ADEPT) with application to walking stride segmentation, Submitted to Biostatistics, 2018.
2: Karas, M. Walking strides segmentation with adept
package vignette.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.