Tools for cleaning high-frequency real-time location tracking data.
trackclean was developed to process data from playground movement research, but applies to any study collecting high-frequency positional data from people moving within a defined space — classrooms, sports facilities, rehabilitation settings, and similar environments.
# Install from CRAN
install.packages("trackclean")
# Or install the development version from GitHub
# install.packages("devtools")
devtools::install_github("tomasbil/trackclean")
The package includes a small example dataset that can be used to trial the full pipeline without any real data. It simulates 10 children tracked during a school recess on a 40m × 60m playground using a UWB positioning system.
library(trackclean)
library(readr)
raw_data <- read_csv(system.file("extdata", "raw_tracking_data.csv", package = "trackclean"))
id_mapping <- system.file("extdata", "id_mapping.csv", package = "trackclean")
The example dataset includes:
- 10 participants with raw tag IDs 1–10, mapped to child IDs 5001–5010
- ~13.5 minutes of data (11:45:00–11:58:30), with observations both inside and outside the analysis window
- Sub-second timestamps causing multiple readings per second — handled by standardize_to_seconds()
- Randomly dropped seconds creating gaps — handled by interpolate_gaps()
- One tag replacement: participant 5003 starts on raw tag ID 3, which is swapped to raw tag ID 11 at 11:51:00 — handled by fix_tag_replacement()
Analysis parameters for this dataset:
| Parameter | Value |
|-----------|-------|
| analyze_start | "2025-03-18 11:47:00" |
| analyze_end | "2025-03-18 11:57:00" |
| bell_start | "2025-03-18 11:53:00" |
| bell_end | "2025-03-18 11:58:00" |
| Tag replacement | raw_id 3 → raw_id 11 at "2025-03-18 11:51:00" |
Raw tracking data (raw_tracking_data.csv):
| ID | At | X | Y | |----|----|---|---| | 1 | 2025-03-18 11:45:00.00 | 5.000 | 10.000 | | 1 | 2025-03-18 11:45:01.00 | 5.383 | 10.239 | | 1 | 2025-03-18 11:45:01.47 | 5.341 | 10.261 | | ... | | | |
ID: raw tag ID as assigned by the tracking systemAt: timestamp (POSIXct-readable, sub-second precision supported)X, Y: position in metersID mapping (id_mapping.csv):
| raw_id | child_id | |--------|----------| | 1 | 5001 | | 3 | 5003 | | 11 | 5003 | | ... | |
raw_id: tag ID as it appears in the raw datachild_id: standardized participant ID to use in analysischild_id)If a participant's tag was replaced during data collection, run this before the main pipeline:
raw_data <- fix_tag_replacement(
data = raw_data,
original_id = 3,
replacement_id = 11,
replacement_time = "2025-03-18 11:51:00"
)
This will: - Keep observations from tag 3 before 11:51 - Rename tag 11 observations from 11:51 onwards to tag 3 - Remove tag 3 observations from 11:51 onwards (duplicate/invalid) - Remove tag 11 observations before 11:51 (not yet attached)
Create a CSV file with two columns mapping raw device IDs to your participant IDs:
raw_id,child_id
1,5001
2,5002
3,5003
Or use the bundled example file:
id_mapping <- system.file("extdata", "id_mapping.csv", package = "trackclean")
library(trackclean)
library(readr)
raw_data <- read_csv(system.file("extdata", "raw_tracking_data.csv", package = "trackclean"))
# Fix tag replacement first (if applicable)
raw_data <- fix_tag_replacement(
data = raw_data,
original_id = 3,
replacement_id = 11,
replacement_time = "2025-03-18 11:51:00"
)
cleaned_data <- clean_playground_data(
data = raw_data,
id_mapping = system.file("extdata", "id_mapping.csv", package = "trackclean"),
analyze_start = "2025-03-18 11:47:00",
analyze_end = "2025-03-18 11:57:00",
bell_start = "2025-03-18 11:53:00",
bell_end = "2025-03-18 11:58:00",
output_file = "cleaned_data.csv"
)
For more control, run each step separately:
# Step 1: Map IDs
data <- map_ids(raw_data, id_mapping)
# Step 2: Mark time periods
data <- mark_time_periods(
data,
analyze_start = "2025-03-18 11:47:00",
analyze_end = "2025-03-18 11:57:00",
bell_start = "2025-03-18 11:53:00",
bell_end = "2025-03-18 11:58:00"
)
# Step 3: Standardize to seconds
data <- standardize_to_seconds(data)
# Step 4: Interpolate gaps
data <- interpolate_gaps(
data,
max_gap_small = 10,
max_position_change = 0.3
)
The package uses a two-phase approach to handle missing data:
Phase 1: Interpolates small gaps (≤10 seconds by default) - Uses linear interpolation between known points - Appropriate for brief signal losses
Phase 2: Interpolates larger gaps conditionally - Only when position change between endpoints is minimal (≤30cm by default) - Indicates the participant remained stationary during the gap - Prevents false movement estimates for longer signal dropouts
All functions provide: - Progress messages and summaries - Data integrity checks - Row count validation - Clear flagging of imputed vs. original data
| Function | Purpose |
|----------|---------|
| clean_playground_data() | Complete pipeline in one call |
| fix_tag_replacement() | Fix tag replacements (run before pipeline) |
| map_ids() | Map raw device IDs to participant IDs |
| mark_time_periods() | Create Analyze and Bell columns |
| standardize_to_seconds() | Aggregate to one-second intervals |
| interpolate_gaps() | Two-phase gap interpolation |
The cleaned dataset includes these flags:
id_code: Standardized participant IDAnalyze: 1 if within analysis period, 0 otherwiseBell: 1 if within bell period, 0 otherwise (if specified)n_entries: Original number of signals in that secondstandardized: 1 if multiple signals were averaged, 0 otherwiseimputed: 1 if row added via phase 1 interpolationimputed_large: 1 if row added via phase 2 interpolationcleaned_data <- clean_playground_data(
data = raw_data,
id_mapping = "id_mapping.csv",
analyze_start = "2025-03-18 11:47:00",
analyze_end = "2025-03-18 11:57:00",
max_gap_small = 5, # Phase 1: ≤5 seconds
max_gap_large = 30, # Phase 2: ≤30 seconds max
max_position_change = 0.5 # Phase 2: ≤50cm movement
)
Tomas Bilevicius
CC BY 4.0 — you are free to use, share, and adapt this package for any purpose, including commercially, as long as you give appropriate credit to the author.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.