knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE )
library(eyeris)
When working with large eyeris databases containing millions of eye-tracking data points, traditional export methods can run into memory limitations or create unwieldy files. The chunked database export functionality in eyeris provides an out-of-the-box solution for handling really large eyerisdb databases by:
CSV and Parquet formats for optimal performanceThis vignette walks through how to use these features after you've created an eyerisdb database using bidsify(db_enabled = TRUE).
Before using the chunked export functions, you need:
eyerisdb database created with bidsify(db_enabled = TRUE)arrow package installed (for Parquet support): install.packages("arrow") (arrow is included when installing eyeris from CRAN)The easiest way to export your entire database is with eyeris_db_to_chunked_files():
result <- eyeris_db_to_chunked_files( bids_dir = "/path/to/your/bids/directory", db_path = "my-project" # your database name ) # view what was exported print(result)
Using the eyeris_db_to_chunked_files() function defaults, this will:
- Process 1 million rows at a time (i.e., the default chunk size)
- Create files up to 500MB each (i.e., the default max file size)
- Export all data types found in your database
- Save files to bids_dir/derivatives/eyerisdb_export/my-proj/
The function creates organized output files:
derivatives/eyerisdb_export/my-proj/ ├── my-proj_timeseries_chunked_01.csv # Single file (< 500MB) ├── my-proj_events_chunked_01-of-02.csv # Multiple files due to size ├── my-proj_events_chunked_02-of-02.csv ├── my-proj_confounds_summary_goal_chunked_01.csv # Grouped by schema ├── my-proj_confounds_summary_stim_chunked_01.csv # Different column structure ├── my-proj_confounds_events_chunked_01.csv ├── my-proj_epoch_summary_chunked_01.csv └── my-proj_epochs_pregoal_chunked_01-of-03.csv # Epoch-specific data
You can customize the maximum file size to create smaller, more manageable files:
# Create smaller files for easy distribution result <- eyeris_db_to_chunked_files( bids_dir = "/path/to/bids", db_path = "large-project", max_file_size_mb = 100, # 100MB files instead of 500MB chunk_size = 500000 # Process 500k rows at a time )
This is particularly useful when: - Uploading to cloud storage with size/transfer bandwidth limits - Sharing data via email or file transfer services - Working with limited storage space
For large databases, you may only need certain types of data:
# Export only pupil timeseries and events result <- eyeris_db_to_chunked_files( bids_dir = "/path/to/bids", db_path = "large-project", data_types = c("timeseries", "events"), subjects = c("sub-001", "sub-002", "sub-003") # Specific subjects only )
Available data types typically include:
- timeseries - Preprocessed eye-tracking pupil data
- events - Experimental events
- epochs - Epoched data around events
- confounds_summary - Confound variables by epoch
- blinks - Detected blinks
For better performance and compression, use Parquet format:
result <- eyeris_db_to_chunked_files( bids_dir = "/path/to/bids", db_path = "large-project", file_format = "parquet", max_file_size_mb = 200 )
Parquet advantages:
- Smaller file sizes (often 50-80% smaller than CSV)
- Faster reading with arrow::read_parquet()
- Better data types (preserves numeric precision)
- Column-oriented storage for analytics
# Read a single CSV file data <- read.csv("path/to/timeseries_chunked.csv") # Read a single Parquet file (requires arrow package) if (requireNamespace("arrow", quietly = TRUE)) { data <- arrow::read_parquet("path/to/timeseries_chunked.parquet") }
When files are split due to size limits, you can recombine them:
# Find all parts of a split dataset files <- list.files( "path/to/eyerisdb_export/my-project/", pattern = "timeseries_chunked_.*\\.csv$", full.names = TRUE ) # Read and combine all parts combined_data <- do.call(rbind, lapply(files, read.csv)) # Or use the built-in helper function combined_data <- read_eyeris_parquet( parquet_dir = "path/to/eyerisdb_export/my-project/", data_type = "timeseries" )
For specialized analysis, you can process chunks with custom functions:
# Connect to database directly con <- eyeris_db_connect("/path/to/bids", "large-project") # Define custom analysis function for pupil data analyze_chunk <- function(chunk) { # Calculate summary statistics for this chunk stats <- data.frame( n_rows = nrow(chunk), subjects = length(unique(chunk$subject_id)), mean_eye_x = mean(chunk$eye_x, na.rm = TRUE), mean_eye_y = mean(chunk$eye_y, na.rm = TRUE), mean_pupil_raw = mean(chunk$pupil_raw, na.rm = TRUE), mean_pupil_processed = mean(chunk$pupil_raw_deblink_detransient_interpolate_lpfilt_z, na.rm = TRUE), missing_pupil_pct = sum(is.na(chunk$pupil_raw)) / nrow(chunk) * 100, hz_modes = paste(unique(chunk$hz), collapse = ",") ) # Save chunk summary (append to growing file) write.csv(stats, "chunk_summaries.csv", append = file.exists("chunk_summaries.csv")) return(TRUE) # Indicate success } # Hypothetical example: process large timeseries dataset in chunks result <- process_chunked_query( con = con, query = " SELECT subject_id, session_id, time_secs, eye_x, eye_y, pupil_raw, pupil_raw_deblink_detransient_interpolate_lpfilt_z, hz FROM timeseries_01_enc_clamp_run01 WHERE pupil_raw > 0 AND eye_x IS NOT NULL ORDER BY time_secs ", chunk_size = 100000, process_chunk = analyze_chunk ) eyeris_db_disconnect(con)
For databases with hundreds of millions of rows:
# Optimize for very large datasets result <- eyeris_db_to_chunked_files( bids_dir = "/path/to/bids", db_path = "massive-project", chunk_size = 2000000, # 2M rows per chunk for efficiency max_file_size_mb = 1000, # 1GB files (larger but fewer files) file_format = "parquet", # Better compression data_types = "timeseries" # Focus on primary data type for analysis )
If you encounter out-of-memory errors:
# Reduce chunk size result <- eyeris_db_to_chunked_files( bids_dir = "/path/to/bids", db_path = "project", chunk_size = 250000, # Smaller chunks verbose = TRUE # Monitor progress )
The function automatically handles this by processing tables in batches, but if you encounter issues:
When you see "Set operations can only apply to expressions with the same number of result columns":
If files are locked or in use:
eyerisdb database fileFor additional help:
?eyeris_db_to_chunked_fileseyeris_db_summary(bids_dir, db_path)eyeris_db_list_tables(con)verbose = TRUEThe built-in chunked eyerisdb database export functionality provides a robust solution for working with large eyerisdb databases. Key benefits include:
This makes it possible to work with even the largest eye-tracking/pupillometry datasets while maintaining performance/reliability without sacrificing the ability to share high-quality, reproducible datasets that support collaborative and open research.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.