bin_data | R Documentation |
In time series with variable measurements, an often recurring task is calculating the total time spent (i.e. the duration) in fixed bins, for example per hour or day. However, this may be difficult when two subsequent measurements are in different bins or span over multiple bins.
bin_data(
data,
start_time,
end_time,
by = c("sec", "min", "hour", "day"),
fixed = TRUE
)
data |
A data frame or tibble containing the time series. |
start_time |
The column name of the start time of the interval, a POSIXt. |
end_time |
The column name of the end time of the interval, a POSIXt. |
by |
A binning specification. |
fixed |
Whether to create fixed bins. If |
A tibble containing the group columns (if any), date, hour (if by = "hour"
), and
the duration in seconds.
link_gaps()
for linking gaps to data.
library(dplyr)
data <- tibble(
participant_id = 1,
datetime = c(
"2022-06-21 15:00:00", "2022-06-21 15:55:00",
"2022-06-21 17:05:00", "2022-06-21 17:10:00"
),
confidence = 100,
type = "WALKING"
)
# get bins per hour, even if the interval is longer than one hour
data |>
mutate(datetime = as.POSIXct(datetime)) |>
mutate(lead = lead(datetime)) |>
bin_data(
start_time = datetime,
end_time = lead,
by = "hour"
)
# Alternatively, you can give an integer value to by to create custom-sized
# bins, but only if fixed = FALSE. Not that these bins are not rounded to,
# as in this example 30 minutes, but rather depends on the earliest time
# in the group.
data |>
mutate(datetime = as.POSIXct(datetime)) |>
mutate(lead = lead(datetime)) |>
bin_data(
start_time = datetime,
end_time = lead,
by = 1800L,
fixed = FALSE
)
# More complicated data for showcasing grouping:
data <- tibble(
participant_id = 1,
datetime = c(
"2022-06-21 15:00:00", "2022-06-21 15:55:00",
"2022-06-21 17:05:00", "2022-06-21 17:10:00"
),
confidence = 100,
type = c("STILL", "WALKING", "STILL", "WALKING")
)
# binned_intervals also takes into account the prior grouping structure
out <- data |>
mutate(datetime = as.POSIXct(datetime)) |>
group_by(participant_id) |>
mutate(lead = lead(datetime)) |>
group_by(participant_id, type) |>
bin_data(
start_time = datetime,
end_time = lead,
by = "hour"
)
print(out)
# To get the duration for each bin (note to change the variable names in sum):
purrr::map_dbl(
out$bin_data,
~ sum(as.double(.x$lead) - as.double(.x$datetime),
na.rm = TRUE
)
)
# Or:
out |>
tidyr::unnest(bin_data, keep_empty = TRUE) |>
mutate(duration = .data$lead - .data$datetime) |>
group_by(bin, .add = TRUE) |>
summarise(duration = sum(.data$duration, na.rm = TRUE), .groups = "drop")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.