knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE, global.par = TRUE ) ## Label to whether or not run code from local run_local <- FALSE # run_local <- FALSE # wd.tmp <- "/Users/martakaras/Dropbox/_PROJECTS/R/adept/vignettes/"
This vignette provides an example of walking stride segmentation from subsecond accelerometry data with adept
package. We demonstrate that ADEPT method ([1]) can be used to perform automatic and precise walking stride segmentation from data collected during a combination of running, walking and resting exercise. We demonstrate how to segment data:
i. with the use of stride templates that were pre-computed based on data from an external study, ii. by deriving new stride templates in a semi-manual manner.
See Introduction to adept package vignette for an introduction to the ADEPT method and usage examples of the segmentPattern
function which implements the ADEPT method.
The adept
package comes with acc_running
data - a sample of raw accelerometry data collected during 25 minutes of an outdoor run. Data were collected at frequency 100 Hz with two ActiGraph GT9X Link sensors located at left hip and left ankle.
mapmyrun mobile tracking application (link) was used during 25 minutes of run corresponding to acc_running
accelerometry data set. Based on the app, the distance covered is approximately 3.35 km. A ground elevation plot from the app output indicates some trial signatures, including an uphill part on the north side of the Patterson Park in Baltimore, MD. The acc_running
data sample is a subset of accelerometry data collected over a few days time; the sample attached should match the time while an app was turned on up to ~1-minute accuracy.
knitr::include_graphics('https://imgur.com/VcK6o7o.jpg')
Screenshot taken from a personal profile of mapmyrun tracking application, accessed via https://www.mapmyrun.com.
The data were collected with two ActiGraph GT9X Link sensors at frequency 100 Hz. ActiGraph GT9X Link is an activity monitor which includes 3-axis accelerometer measuring body acceleration along three orthogonal axes. At frequency 100 Hz, 100 observations are collected each second, where each observation consists of 3 data points (one data point from each of three orthogonal axes), yielding a total of 300 data points per second.
One of the sensors was attached to the shoe with a clip on the outside side of a left foot, just below the left ankle. The other sensor was attached to the elastic belt located at the hip, on the left side of a hip. Both devices stayed in a fairly stable way during the run.
knitr::include_graphics('https://imgur.com/5Ji2hQF.jpg');knitr::include_graphics('https://imgur.com/lTdCixX.jpg')
Wearable accelerometer devices location during the experiment. The devices were still covered with a protective plastic foil.
acc_running
The sample accelerometry data is attached to the package as acc_running
data frame. To access the data, load adept
package.
# install.packages("devtools") # devtools::install_github("martakarass/adept") library(adept) library(lubridate) options(digits.secs = 2) head(acc_running) tz(acc_running$date_time)
acc_running
consists of 300,000 observations of 5 variables:
x
- acceleration measurement time series collected from a $x$-axis of the sensor accelerometer,y
- acceleration measurement time series collected from a $y$-axis of the sensor accelerometer,x
- acceleration measurement time series collected from a $z$-axis of the sensor accelerometer,vm
- vector magnitude of $(x,y,z)$ vector of measurements, computed as $vm = \sqrt{x^2+y^2+z^2}$,date_time
- date and time of acceleration measurement collection stored as POSIXct
,sensor_location
- sensor location label, one of: "left_hip"
, "left_ankle"
.date_time
column valuesdate_time
column: we would expect its second value to be 17:57:30.01
, not: 17:57:30.00
. Values in date_time
column were generated via seq(from = as.POSIXct("2018-10-25 17:57:30.00", tz = "UTC"), by = 0.01, length.out = 150000)
. I identify the problem to be the floating point arithmetic problem, as discussed in this SO question. Such errors are "made up for" over the column values, and the whole date_time
has expected range of time. Time collection spans 25 minutes. It agrees with a number of observations per sensor we have (150000
), given the data was collected at frequency 100 observations per second.
range(acc_running$date_time) table(acc_running$sensor_location) 150000 / (100 * 60)
One way to visualize raw accelerometry data is to plot it as a three-dimensional time-series $(x,y,z)$. Here, we plot data from three different time frames, each of 4 seconds length, simultaneously for data collected at left ankle and right ankle.
library(lubridate) library(dplyr) library(ggplot2) library(reshape2) library(gridExtra) ## Define time frame start values for data subset t1 <- ymd_hms("2018-10-25 18:07:00", tz = "UTC") t2 <- ymd_hms("2018-10-25 18:20:30", tz = "UTC") t3 <- ymd_hms("2018-10-25 18:22:00", tz = "UTC") ## Subset data acc_running_sub <- acc_running %>% filter((date_time >= t1 & date_time < t1 + as.period(4, "seconds")) | (date_time >= t2 & date_time < t2 + as.period(4, "seconds")) | (date_time >= t3 & date_time < t3 + as.period(4, "seconds")) ) ## Plot (x,y,z) values acc_running_sub %>% select(-vm) %>% melt(id.vars = c("date_time", "sensor_location")) %>% mutate(dt_floor = paste0("time frame start: ", floor_date(date_time, unit = "minutes"))) %>% ggplot(aes(x = date_time, y = value, color = variable)) + geom_line(size = 0.5) + facet_grid(sensor_location ~ dt_floor, scales = "free_x") + theme_bw(base_size = 9) + labs(x = "Time [s]", y = "Acceleration [g]", color = "Accelerometer axis of measurement: ", title = "Raw accelerometry data (x,y,z)") + theme(legend.position = "top")
Using vector magnitude $(vm)$ is a way to summarize a three-dimensional time-series $(x,y,z)$ as a one-dimensional time-series. Vector magnitude is computed as $vm = \sqrt{x^2 + y^2 + z^2}$ at each time point of data collection.
## Plot vector magnitude values acc_running_sub %>% mutate(dt_floor = paste0("time frame start: ", floor_date(date_time, unit = "minutes"))) %>% ggplot(aes(x = date_time, y = vm)) + geom_line(size = 0.5) + facet_grid(sensor_location ~ dt_floor, scales = "free_x") + theme_bw(base_size = 9) + labs(x = "Time [s]", y = "Vector magnitue [g]", title = "Vector magnitude (vm) summary of raw accelerometry data")
From the $(x,y,z)$ and $(vm)$ plots we can tell the presence of an asymmetric repetitive pattern associated with each walking stride (stride := two subsequent steps):
Also, there are differences in data magnitude and stride duration between the three vertical plot panels:
The left vertical plot panel (time frame start: 2018-10-25 18:07:00
) corresponds to a slow/moderate pace running uphill; $(vm)$ time-series exhibits relatively high magnitude.
The middle vertical plot panel (time frame start: 2018-10-25 18:20:00
) corresponds to a relatively fast running; we can observe that $(vm)$ time-series exhibits high magnitude. Also, there are more stride patterns captured within this 4-seconds time frame than on the two other 4-seconds time frames, which suggests a higher ratio of strides per second performed by a runner.
The right vertical plot panel (time frame start: 2018-10-25 18:22:00
) most likely corresponds to walking; we can observe relatively low $(vm)$ magnitude and a bit different shape of a stride pattern pronounced.
Clearly, it might be challenging to make a plot of all data points collected at frequency 100 Hz from 25 minutes-long exercise. A way to summarize accelerometry data of such high density is to use a $(vmc)$ - vector magnitude count (also known as the mean amplitude deviation). For $\overline{vm}(t,H)$ - average of $(vm)$ time-series over time window of length $H$ starting at time $t$, we define $$\mathrm { vmc } ( t, H ) = \frac { 1 } { H } \sum _ { h = 0 } ^ { H - 1 } \left| vm ( t + h ) - \overline{vm}(t,H) \right|.$$
## Function to compute vmc from a vm window vector vmc <- function(vm.win){ mean(abs(vm.win - mean(vm.win))) } ## Compute vmc vector in 3-seconds windows vm <- acc_running$vm win.vl <- 100 * 3 rn.seq <- seq(1, to = length(vm), by = win.vl) vmc.vec <- sapply(rn.seq, function(rn.i){ vm.win.idx <- rn.i : (rn.i + win.vl - 1) vm.win <- vm[vm.win.idx] vmc(vm.win) }) vmc.df <- data.frame(vmc = vmc.vec, date_time = acc_running$date_time[rn.seq], rn_seq = rn.seq, sensor_location = acc_running$sensor_location[rn.seq]) vmc.df.rest <- vmc.df %>% filter(sensor_location == "left_ankle") %>% mutate(resting = ifelse(vmc < 0.4, 1, 0), vmc_tau_i = rn_seq - 150000) %>% select(date_time, resting, vmc_tau_i) vmc.df <- vmc.df %>% left_join(vmc.df.rest, by = "date_time") ## Plot vmc vmc.df %>% filter(date_time < max(acc_running$date_time) - 5, date_time > min(acc_running$date_time) + 5) %>% ggplot(aes(x = date_time, y = vmc)) + facet_grid(sensor_location ~ .) + geom_tile(aes(fill = factor(resting)), height = Inf, alpha = 0.1) + geom_line(size = 0.3) + theme_bw(base_size = 9) + labs(x = "Exercise time [min]", y = "Vector magnitue count", title = "Vector magnitude count (vmc) computed over 3 second-length windows of (vm)", fill = "Resting: ") + scale_fill_manual(values = c("white", "blue")) + theme(legend.position = "top", legend.background = element_rect(fill = "grey90")) ## Time percentage when (vmc) computed from left ankle is below 0.4 (vmc.low.frac <- mean(vmc.df.rest$resting))
We see that approx. 19\% of the time $(vmc)$ computed from left ankle data is below 0.4 value. Here, that part of data likely corresponds to standing with no walking/running legs movement involved; it may be that the runner was waiting on the crossroad for a green light to cross, or was being tired and hence was making breaks. We use this arbitrarily selected threshold of 0.4 to partition data into time frames of resting ($(vmc)<0.4$; marked with light blue color) and non-resting ($(vmc)\geq 0.4$).
On the contrary, high values of $(vmc)$ correspond to high volumes of physical activity recorded; in this experiment, at approx. 18:21 (6:21 PM) a short (~1 minute) period of a fast run started, corresponding to particularly high values of $(vmc)$ recorded with both left ankle and left hip sensors.
To perform strides segmentation from $(vm)$ time-series, we need a stride pattern template that is specific to a sensor location that data were collected at. We are going to demonstrate two approaches:
Approach 1: Use stride pattern templates attached to the adept
package as stride_template
object. These walking strides were derived from a set of manually pre-segmented strides from data collected in a different experiment, with different participants (see ?stride_template
for details).
Approach 2: Derive stride pattern templates manually from acc_running
data set.
The segmentPattern
function algorithm has two phases in which a smoothed version of $(vm)$ time-series may be used:
In computation of similarity matrix between pattern template and (possibly smoothed) $(vm)$ signal. Here, x.adept.ma.W
argument of segmentPattern
function defines a length of a window used in moving average smoothing; it is expressed in seconds.
For this phase, we recommend low/moderate smoothing (x.adept.ma.W = 0.05
, x.adept.ma.W = 0.15
) so as tiny $(vm)$ bumps are smoothed, but the overall $(vm)$ shape features are not lost.
In "maxima" tuning procedure where local maxima of (possibly smoothed) $(vm)$ signal are identified as final beginnings and ends of a stride pattern occurrence. Here, finetune.maxima.ma.W
argument of segmentPattern
function defines a length of a window used in moving average smoothing; it is expressed in seconds.
For this phase, in case of a very clear signal and distinct signal local maxima, no smoothing may be used (finetune.maxima.ma.W = NULL
); in case of low-signal and noisy data (i.e., often the case of data collected with a sensor located at wrist), high levels of smoothing (finetune.maxima.ma.W = 0.3
or similar) may be used.
Function windowSmooth {adept}
computes moving window average of a time-series x
and may be used to experiment with different values of smoothing window length W
(expressed in seconds). Results of the exemplary choice of smoothing parameter W
are presented below.
## Use moving window average with a length of a window equal 0.05 s acc_running$vm_smoothed1 <- windowSmooth(x = acc_running$vm, x.fs = 100, W = 0.05) acc_running %>% filter((date_time >= t1 & date_time < t1 + as.period(4, "seconds")) | (date_time >= t2 & date_time < t2 + as.period(4, "seconds")) | (date_time >= t3 & date_time < t3 + as.period(4, "seconds"))) %>% mutate(dt_floor = paste0("time frame start: ", floor_date(date_time, unit = "minutes"))) %>% ggplot(aes(x = date_time, y = vm_smoothed1)) + geom_line(size = 0.5) + facet_grid(sensor_location ~ dt_floor, scales = "free_x") + theme_bw(base_size = 9) + labs(x = "Time [s]", y = "Vector magnitue [g]", color = "Axis: ", title = "Vector magnitude (vm) smoothed, length of smoothing window = 0.05 s")
We first construct a template
object; it is a list of left ankle-specific stride templates (due to movement symmetry, it could serve for right ankle same well!). Here, we use two precomputed templates (see ?stride_template
for details).
template <- list(stride_template$left_ankle[[2]][1, ], stride_template$left_ankle[[2]][2, ]) par(mfrow = c(1,2), cex = 0.7) plot(template[[1]], type = "l", xlab = "", ylab = "", main = "Left ankle: template 1") plot(template[[2]], type = "l", xlab = "", ylab = "", main = "Left ankle: template 2")
We perform strides segmentation from $(vm)$ accelerometry signal. Explanation of some of parameters used:
x
- A $(vm)$ vector we segment pattern occurrences from. x.fs
- Frequency at which data is collected, expressed in a number of observations per second.template
- A list of pattern templates. pattern.dur.seq
- A grid of template scales considered in segmentation. Grid values are expressed in seconds of pattern duration time; their range is determined by what we consider plausible duration time for a pattern to take. In case of healthy adult person, 0.5-1.8 seconds range for a stride (:= two subsequent steps) seems reasonable; for data collected in elderly population, 0.5-2.25 seconds could be considered etc. (See: Details
in ?segmentPattern
). finetune
- If set to "maxima"
, a procedure is employed to tune preliminarily identified locations of pattern occurrence beginning and end so as they correspond to local maxima of $(vm)$. run.parallel
- Whether or not to use parallel execution in the algorithm. The future
package is used to execute code asynchronously.See ?segmentPattern
for detailed explanation of all method parameters.
# ```r x.la <- acc_running$vm[acc_running$sensor_location == "left_ankle"] out1.la <- segmentPattern(x = x.la, x.fs = 100, template = template, pattern.dur.seq = seq(0.5, 1.8, length.out = 50), similarity.measure = "cor", x.adept.ma.W = 0.15, finetune = "maxima", finetune.maxima.ma.W = 0.05, finetune.maxima.nbh.W = 0.2, compute.template.idx = TRUE, run.parallel = TRUE)
## Run only manually, when some changes have been introduced and file needs to be updated # setwd(wd.tmp) path.tmp <- "static/results/out1_la.csv" write.csv(out1.la, path.tmp, row.names = FALSE)
## Run only when executed locally path.tmp <- "static/results/out1_la.csv" x.la <- acc_running$vm[acc_running$sensor_location == "left_ankle"] out1.la <- read.csv(path.tmp)
head(out1.la)
Each row of the returned data.frame
describes one identified pattern occurrence:
tau_i
-Index of a vector (here: $(vm)$ time-series) where an identified pattern occurrence starts.T_i
- Duration of identified pattern occurrence, expressed in a vector length.sim_i
- Value of similarity statistic between pattern template and a data signal (here: $(vm)$ time-series) where the pattern occurrence was identified (see ?segmentPattern
for details).template_i
- If compute.template.idx
equals TRUE
: index of pattern template that had the best match with a data signal where the pattern occurrence was identified; otherwise: NA
.We now run segmentation on data collected at left hip. We then compare the results.
Construct template
object; it is a list of left hip-specific stride templates (again: due to movement symmetry, it could serve for right ankle same well!).
template <- list(stride_template$left_hip[[1]][1, ], stride_template$left_hip[[2]][2, ]) par(mfrow = c(1,2), cex = 0.7) plot(template[[1]], type = "l", xlab = "", ylab = "", main = "Left hip: template 1") plot(template[[2]], type = "l", xlab = "", ylab = "", main = "Left hip: template 2")
template <- list(stride_template$left_hip[[1]][1, ], stride_template$left_hip[[2]][2, ]) x.lh <- acc_running$vm[acc_running$sensor_location == "left_hip"] out1.lh <- segmentPattern(x = x.lh, x.fs = 100, template = template, pattern.dur.seq = seq(0.5, 1.8, length.out = 50), similarity.measure = "cor", x.adept.ma.W = 0.05, finetune = "maxima", finetune.maxima.ma.W = 0.15, finetune.maxima.nbh.W = 0.2, compute.template.idx = TRUE, run.parallel = TRUE)
# setwd(wd.tmp) path.tmp <- "static/results/out1_lh.csv" write.csv(out1.lh, path.tmp, row.names = FALSE)
x.lh <- acc_running$vm[acc_running$sensor_location == "left_hip"] path.tmp <- "static/results/out1_lh.csv" out1.lh <- read.csv(path.tmp)
head(out1.lh)
We now plot estimated stride duration in seconds (T_i
/100) against estimated stride start in minutes (tau_i
/(10060)). We firstly filter the results so as to keep only those which are likely* corresponding to walking/running (not: resting). We also mark estimated resting/not-resting labels for subsequent 3-minute time windows which we obtained from $(vmc)$ data.
## Merge results from two sensor locations, add resting/non-resting info plt.df.la <- out1.la %>% merge(vmc.df.rest) %>% filter(tau_i >= vmc_tau_i, tau_i < vmc_tau_i + 3 * 100, resting == 0) %>% mutate(location = "left_ankle") plt.df.lh <- out1.lh %>% merge(vmc.df.rest) %>% filter(tau_i >= vmc_tau_i, tau_i < vmc_tau_i + 3 * 100, resting == 0) %>% mutate(location = "left_hip") plt.df <- rbind(plt.df.la, plt.df.lh) ggplot(plt.df, aes(x = tau_i / (100 * 60) , y = T_i / 100)) + geom_tile(data = vmc.df.rest, aes(x = vmc_tau_i / (100 * 60), y = 1, fill = factor(resting)), height = Inf, alpha = 0.1, inherit.aes = FALSE) + geom_point(alpha = 0.2) + facet_grid(location ~ .) + theme_bw(base_size = 10) + labs(x = "Exercise time [min]", y = "Estimated stride duration time [s]", fill = "Resting: ") + theme(legend.position = "top") + scale_fill_manual(values = c("white", "blue")) + theme(legend.position = "top", legend.background = element_rect(fill = "grey90")) + scale_y_continuous(limits = c(0.5, 1.8))
finetune.maxima.ma.W = 0.05
, left hip: finetune.maxima.ma.W = 0.15
). We now use the estimated stride occurrence start index (tau_i
) and estimated stride duration (T_i
) to retrieve $(vm)$ time-series segments corresponding to stride occurrences. We further align them in phase and scale to better observe stride patterns.
## For data frame #1 (raw vm segments) stride.acc.vec.la <- numeric() stride.tau_i.vec.la <- numeric() stride.idx.vec.la <- numeric() ## For data frame #2 (scaled vm segments) stride_S.acc.vec.la <- numeric() stride_S.tau_i.vec.la <- numeric() stride_S.phase.vec.la <- numeric() for (i in 1:nrow(plt.df.la)){ out.i <- plt.df.la[i, ] x.la.i <- x.la[out.i$tau_i : (out.i$tau_i + out.i$T_i - 1)] x.la.i.len <- length(x.la.i) if (var(x.la.i) < 1e-3) next ## For data frame #1 stride.acc.vec.la <- c(stride.acc.vec.la, x.la.i) stride.tau_i.vec.la <- c(stride.tau_i.vec.la, rep(out.i$tau_i, x.la.i.len)) stride.idx.vec.la <- c(stride.idx.vec.la, 1:x.la.i.len) ## For data frame #2 x.la.i_S <- approx(x = seq(0, 1, length.out = length(x.la.i)), y = x.la.i, xout = seq(0, 1, length.out = 200))$y x.la.i_S <- as.numeric(scale(x.la.i_S)) stride_S.acc.vec.la <- c(stride_S.acc.vec.la, x.la.i_S) stride_S.tau_i.vec.la <- c(stride_S.tau_i.vec.la, rep(out.i$tau_i, 200)) stride_S.phase.vec.la <- c(stride_S.phase.vec.la, seq(0, 1, length.out = 200)) } ## data frame #1 stride.df.la <- data.frame(acc = stride.acc.vec.la, tau_i = stride.tau_i.vec.la, idx = stride.idx.vec.la) ## data frame #2 stride_S.df.la <- data.frame(acc = stride_S.acc.vec.la, tau_i = stride_S.tau_i.vec.la, phase = stride_S.phase.vec.la)
# setwd(wd.tmp) ## For data frame #1 path.tmp <- "static/results/stride_df_la.csv" write.csv(stride.df.la, path.tmp, row.names = FALSE) ## For data frame #2 path.tmp <- "static/results/stride_S_df_la.csv" write.csv(stride_S.df.la, path.tmp, row.names = FALSE)
## For data frame #1 path.tmp <- "static/results/stride_df_la.csv" stride.df.la <- read.csv(path.tmp) ## For data frame #2 path.tmp <- "static/results/stride_S_df_la.csv" stride_S.df.la <- read.csv(path.tmp)
## Plot segmented walking strides plt1 <- stride.df.la %>% ggplot(aes(x = idx/100, y = acc, group = tau_i)) + geom_line(alpha = 0.1) + theme_bw(base_size = 8) + labs(x = "Stride pattern duration [s]", y = "Vector magnitude [g]", title = "Segmented walking strides") plt2 <- stride_S.df.la %>% ggplot(aes(x = phase, y = acc, group = tau_i)) + geom_line(alpha = 0.1) + theme_bw(base_size = 8) + labs(x = "Stride pattern phase", y = "Vector magnitude (scaled) [g]", title = "Segmented walking strides, aligned and scaled") grid.arrange(plt1, plt2, nrow = 1)
We can further use correlation clustering to group segmented walking strides.
## Compute strides distance martrix stride_S.dfdc.la <- dcast(stride_S.df.la, phase ~ tau_i, value.var = "acc")[, -1] data.mat.tau_i <- as.numeric(colnames(stride_S.dfdc.la)) data.mat <- as.matrix(stride_S.dfdc.la) D.mat <- dist(cor(data.mat)) ## Get cluster medoids cluster.k <- 2 medoids.idx <- round(seq(1, ncol(stride_S.dfdc.la), length.out = cluster.k + 2)) medoids.idx <- medoids.idx[-c(1, medoids.idx + 2)] ## Cluster strides set.seed(1) pam.out <- cluster::pam(D.mat, cluster.k, diss = TRUE, medoids = medoids.idx) table(pam.out$clustering) ## Put clustering results into data frame data.df <- as.data.frame(t(data.mat)) colnames(data.df) <- seq(0, to = 1, length.out = 200) data.df$tau_i <- data.mat.tau_i data.df$cluster <- pam.out$clustering data.dfm <- melt(data.df, id.vars = c("tau_i", "cluster")) data.dfm$variable <- as.numeric(as.character(data.dfm$variable)) data.dfm$cluster <- paste0("cluster ", data.dfm$cluster)
# setwd(wd.tmp) path.tmp <- "static/results/data_dfm.csv" write.csv(data.dfm, path.tmp, row.names = FALSE)
path.tmp <- "static/results/data_dfm.csv" data.dfm <- read.csv(path.tmp)
data.dfm.agg <- data.dfm %>% group_by(variable, cluster) %>% summarise(value = mean(value)) ggplot(data.dfm, aes(x = variable, y = value, group = tau_i)) + geom_line(alpha = 0.2) + geom_line(data = data.dfm.agg, aes(x = variable, y = value, group = 1), color = "red", size = 1, inherit.aes = FALSE) + facet_grid(cluster ~ .) + theme_bw(base_size = 9) + labs(x = "Stride pattern phase", y = "Vector magnitude (scaled) [g]", title = "Segmented walking strides, aligned, scaled, clustered\nRed line: point-wise mean")
We can further plot the estimated stride duration time over the course of the running exercise, marking the stride pattern cluster assignment with colour.
data.dfm %>% select(tau_i, cluster) %>% distinct() %>% left_join(plt.df.la, by = "tau_i") %>% ggplot(aes(x = tau_i / (100 * 60) , y = T_i / 100, color = cluster)) + geom_tile(data = vmc.df.rest, aes(x = vmc_tau_i / (100 * 60), y = 1, fill = factor(resting)), height = Inf, alpha = 0.1, inherit.aes = FALSE) + geom_point(alpha = 0.4) + theme_bw(base_size = 9) + labs(x = "Exercise time [min]", y = "Estimated stride duration time [s]", color = "Stride assignment: ", fill = "Resting: ") + scale_fill_manual(values = c("white", "blue")) + theme(legend.position = "top", legend.background = element_rect(fill = "grey90")) + scale_y_continuous(limits = c(0.5, 1.8))
We now demonstrate a way of deriving stride patterns from data in a semi-manual way. The proposed procedure is done separately for data collected at two different locations, and goes as follows:
Select short fragments of $(vm)$ time-series that are likely to represent data coming from different pace of running/walking. Here, we select two such $(vm)$ fragments, each of 6 seconds length.
Within each of three $(vm)$ fragments, use a function to automatically identify all local maxima.
Within each of three $(vm)$ fragments, identify a subset of local maxima that corresponds to $(vm)$ peaks of interest via visual inspection ("manual" part).
For each of three $(vm)$ fragments, cut the $(vm)$ fragment into parts at the points of identified local maxima subset. Interpolate parts to common a length and compute their point-wise average. The resulted point-wise average is a newly created pattern template.
That way, for each sensor location, we arrive at three distinct stride pattern templates that we further use in segmentation.
## Function to compute local maxima ## source: https://stackoverflow.com/questions/6836409/finding-local-maxima-and-minima localMaxima <- function(x) { y <- diff(c(-.Machine$integer.max, x)) > 0L rle(y)$lengths y <- cumsum(rle(y)$lengths) y <- y[seq.int(1L, length(y), 2L)] if (x[[1]] == x[[2]]) { y <- y[-1] } y } ## Function which cut x vector at indices given by x.cut.idx, ## approximate cut parts into common length and average ## parts point-wise into one vector cut.and.avg <- function(x, x.cut.idx){ x.cut.idx.l <- length(x.cut.idx) mat.out <- matrix(NA, nrow = x.cut.idx.l-1, ncol = 200) for (i in 1:(length(x.cut.idx)-1)){ xp <- x[x.cut.idx[i]:x.cut.idx[i+1]] mat.out[i, ] <- approx(seq(0, 1, length.out = length(xp)), xp, seq(0, 1, length.out = 200))$y } out <- apply(mat.out, 2, mean) } ## Make subset of data which has data parts of different speed of running acc_running_sub2 <- acc_running %>% filter((date_time >= t2 & date_time < t2 + as.period(6, "seconds")) | (date_time >= t3 & date_time < t3 + as.period(6, "seconds"))) %>% mutate(dt_floor = floor_date(date_time, unit = "minutes")) ## Vector of signatures for data parts of different speed of running dt_floor.unique <- unique(acc_running_sub2$dt_floor)
## Left ankle-specific subset of data sub.la <- acc_running_sub2[acc_running_sub2$sensor_location == "left_ankle", ] par(mfrow = c(1,3), cex = 0.6) ## Left ankle: template 1 x1 <- sub.la[sub.la$dt_floor == dt_floor.unique[1], "vm_smoothed1"] x1.locMax <- localMaxima(x1) plot(1:length(x1), x1, type = "l", main = "(vm) local maxima", xlab = "Index", ylab = "") abline(v = x1.locMax, col = "red") plot(1:length(x1), x1, type = "l", main = "(vm) local maxima subset", xlab = "Index", ylab = "") abline(v = x1.locMax[c(3, 6, 9, 13, 16, 19, 23, 26, 29)], col = "red") template.la.x1 <- cut.and.avg(x1, x1.locMax[c(3, 6, 9, 13, 16, 19, 23, 26, 29)]) plot(1:length(template.la.x1), template.la.x1, type = "l", col = "red", main = "left ankle: template 1", xlab = "Index", ylab = "") ## Left ankle: template 2 x2 <- sub.la[sub.la$dt_floor == dt_floor.unique[2], "vm_smoothed1"] x2.locMax <- localMaxima(x2) plot(1:length(x2), x2, type = "l", main = "(vm) local maxima", xlab = "Index", ylab = "") abline(v = x2.locMax, col = "red") plot(1:length(x2), x2, type = "l", main = "(vm) local maxima subset", xlab = "Index", ylab = "") abline(v = x2.locMax[c(8, 18, 26, 37, 44)], col = "red") template.la.x2 <- cut.and.avg(x2, x2.locMax[c(8, 18, 26, 37, 44)]) plot(1:length(template.la.x2), template.la.x2, type = "l", col = "red", main = "left ankle: template 2", xlab = "Index", ylab = "") ## Template object template.la <- list(template.la.x1, template.la.x2)
## Left hip-specific subset of data sub.lh <- acc_running_sub2[acc_running_sub2$sensor_location == "left_hip", ] par(mfrow = c(1,3), cex = 0.6) ## Left hip: template 1 x1 <- sub.lh[sub.lh$dt_floor == dt_floor.unique[1], "vm_smoothed1"] x1.locMax <- localMaxima(x1) plot(1:length(x1), x1, type = "l", main = "(vm) local maxima", xlab = "Index", ylab = "") abline(v = x1.locMax, col = "red") plot(1:length(x1), x1, type = "l", main = "(vm) local maxima subset", xlab = "Index", ylab = "") abline(v = x1.locMax[c(3, 10, 17, 24, 30, 37, 43, 49, 56)], col = "red") template.lh.x1 <- cut.and.avg(x1, x1.locMax[c(3, 10, 17, 24, 30, 37, 43, 49, 56)]) plot(1:length(template.lh.x1), template.lh.x1, type = "l", col = "red", main = "left ankle: template 1", xlab = "Index", ylab = "") ## Left hip: template 2 x2 <- sub.lh[sub.lh$dt_floor == dt_floor.unique[2], "vm_smoothed1"] x2.locMax <- localMaxima(x2) plot(1:length(x2), x2, type = "l", main = "(vm) local maxima", xlab = "Index", ylab = "") abline(v = x2.locMax, col = "red") plot(1:length(x2), x2, type = "l", main = "(vm) local maxima subset", xlab = "Index", ylab = "") abline(v = x2.locMax[c(5, 13, 19, 24, 31)], col = "red") template.lh.x2 <- cut.and.avg(x2, x2.locMax[c(5, 13, 19, 24, 31)]) plot(1:length(template.lh.x2), template.lh.x2, type = "l", col = "red", main = "left ankle: template 2", xlab = "Index", ylab = "") ## Template object template.lh <- list(template.lh.x1, template.lh.x2)
out2.la <- segmentPattern(x = x.la, x.fs = 100, template = template.la, pattern.dur.seq = seq(0.5, 1.8, length.out = 50), similarity.measure = "cor", x.adept.ma.W = 0.15, finetune = "maxima", finetune.maxima.ma.W = 0.05, finetune.maxima.nbh.W = 0.2, compute.template.idx = TRUE, run.parallel = TRUE) out2.lh <- segmentPattern(x = x.lh, x.fs = 100, template = template.lh, pattern.dur.seq = seq(0.5, 1.8, length.out = 50), similarity.measure = "cor", x.adept.ma.W = 0.05, finetune = "maxima", finetune.maxima.ma.W = 0.15, finetune.maxima.nbh.W = 0.2, compute.template.idx = TRUE, run.parallel = TRUE)
# setwd(wd.tmp) path.tmp <- "static/results/out2_la.csv" write.csv(out2.la, path.tmp, row.names = FALSE) path.tmp <- "static/results/out2_lh.csv" write.csv(out2.lh, path.tmp, row.names = FALSE)
path.tmp <- "static/results/out2_la.csv" out2.la <- read.csv(path.tmp) path.tmp <- "static/results/out2_lh.csv" out2.lh <- read.csv(path.tmp)
## Merge results from two sensor locations, add resting/non-resting info plt.df.la.2 <- out2.la %>% merge(vmc.df.rest) %>% filter(tau_i >= vmc_tau_i, tau_i < vmc_tau_i + 3 * 100, resting == 0) %>% mutate(location = "left_ankle") plt.df.lh.2 <- out2.lh %>% merge(vmc.df.rest) %>% filter(tau_i >= vmc_tau_i, tau_i < vmc_tau_i + 3 * 100, resting == 0) %>% mutate(location = "left_hip") plt.df <- rbind(plt.df.la.2, plt.df.lh.2) ggplot(plt.df, aes(x = tau_i / (100 * 60) , y = T_i / 100)) + geom_tile(data = vmc.df.rest, aes(x = vmc_tau_i / (100 * 60), y = 1, fill = factor(resting)), height = Inf, alpha = 0.1, inherit.aes = FALSE) + geom_point(alpha = 0.2) + facet_grid(location ~ .) + theme_bw(base_size = 10) + labs(x = "Exercise time [min]", y = "Estimated stride duration time [s]", fill = "Resting: ") + theme(legend.position = "top") + scale_fill_manual(values = c("white", "blue")) + theme(legend.position = "top", legend.background = element_rect(fill = "grey90")) + scale_y_continuous(limits = c(0.5, 1.8))
References:
1: Karas, M., Straczkiewicz, M., Fadel, W., Harezlak, J., Crainiceanu, C., Urbanek, J.K. Adaptive empirical pattern transformation (ADEPT) with application to walking stride segmentation, Submitted to Biostatistics, 2018.
2: Karas, M., Bai, J., Straczkiewicz, M., Harezlak, J., Glynn, N W., Harris, T., Zipunnikov, V., Crainiceanu, C., Urbanek, J.K. Accelerometry data in health research: challenges and opportunities. Review and examples, Statistics in Biosciences, 2018. (link)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.