library(tidyverse)
library(knitr)
library(rprojroot)
library(lubridate)
library(data.table)

experiment.pipeline relies on a single configuration file

This vignette documents setting up a configuration file within the experiment.pipeline package framework for any arbitrary eyetracking study. The goal is to setup a configuration file which, when combined with a single .edf eyetracking file from SR Research, initializes and preprocesses eyetracking data into a single "ep.eye" object.

The core worker function in the experiment.pipeline package for processing eye data on a single subject, (ep.eye_process_subject.R) relies entirely on a .yaml/config file that the user sets up prior to processing depending on the structure of the task and the processing procedures desired.

The config file is structured hierarchically, with five fields at the highest level. Additional fields are nested within the high-level fields, which we will explore below. Once the config has been set up and is read in to experiment.pipeline, the package's internal environment will have access to all components of the specified configuration in the form of a nested list, which I will show as we go along.

running ep.eye_preprocess_subject:

To get us started, we can extract the path to a single subject's .edf file from a directory containing all raw files run through a single cognitive task and specify a path to the config file for that task:

library(experiment.pipeline)

edf_files <- list.files(file.path(rprojroot::find_package_root_file(), "inst/extdata/raw_data/SortingMushrooms/Eye"), full.names = TRUE); print(edf_files)
edf_path <- edf_files[1] # extract a single subject for example case
config_path <- file.path(rprojroot::find_package_root_file(), "inst/extdata/ep_configs/SortingMushrooms/SortingMushrooms.yaml")

At the end of this vignette, you will be able to process a single subject in an eyetracking study as such:

# don't run
ep.eye_preproc <- ep.eye_process_subject(edf_path, config_path)

By extension, if you have tested your config file on a single subject and think it's ready to roll for all subjects from your study, this can be run while looping over all files in a directory with a simple for loop:

# don't run
all_subjects <- list()
for(subj_file in edf_files){
  id_string <- sub("_SortingMushrooms_Eye.edf", "", subj_file) # extract just the subject's id to store as the name of the element of all_subjects
  all_subjects[[id_string]]  <- ep.eye_process_subject(edf_path, config_path)
}

I have some code written for an additional function ep.eye_process_dir.R that would allow for an easy interface to process an entire directory of files (and parallel processing across subjects) for a single task, but I'll probably write another vignette to go over batch processing for a single task or battery of tasks.

Below I include detailed instructions on how to specify a single config file using a single subject (subject 005_EK from edf_path) to guide decision points along the way.

Config files: expected fields

Starting at the highest level are the task, runs, variable_mapping, definitions, and blocks fields. I like to separate these with some sort of break line to denote changes in major sections of the config file. The major action for eyetracking preprocessing happens in definitions, and a bit in blocks.

config <- experiment.pipeline::validate_exp_yaml(config_path)
################################
task: SortingMushrooms
################################
runs: 
  ################################
variable_mapping:
  ################################
definitions:
  ################################
blocks: 
  ################################

These fields are represented in the named list that will be used to process the eye data. Note that at the highest level only task and runs will have values assigned to them. Leaving colons open at a level of the YAML file either means that there will be subfields with explicit values defined or that the field is to remain NULL/empty.

names(config)

High-level information on the task structure: "task", "runs", and "variable_mapping", fields

These three major fields contain high-level information about the task:

task

The task field is simply the name of the task that is being processed, in this case the "Sorting Mushrooms Task (Approach-only)" from Huys et. al. (2011, PLOS Comp Bio). This is stored in the ep.eye object's metadata. As things are currently set up, this field has no bearing on the processing itself, but may be useful once batch processing capabilities are fully fleshed out (stay tuned).

################################
task: SortingMushrooms
################################

This is imported as:

config$task # or config[["task"]]

runs

The runs field has yet to be built in and validated, but the idea here is that multiple exact replicas of a task can be denoted by the user and the config file can be used iteratively on each run without issue.

################################
runs:
  ################################

This is imported as (empty in this case):

config$runs # or config[["runs"]]

variable_mapping

The variable_mapping field provides a mapping between column names in a $behav dataset (implemented elsewhere) for a subject, mapped to generalized task design constructs that are used within the experiment_pipeline nomenclature. Subfields nested within variable_mapping are specified as such:

################################
variable_mapping:
  id: id
run:
  phase:
  block: block
trial: trial
run_trial:
  block_trial: block_trial
event: event
condition: condition
################################

This is imported as:

config$variable_mapping # or config[["variable_mapping"]]

Each of these subfields map to a specific task-general construct of interest, which are situated hierarchically

The experiment.pipeline hierarchy

Importantly, these subfields constitute a task-general hierarchy that will be present regardless of the specifics of any task. An single task sits atop this hierarchy and a single config file will be needed to process each task, with phases, blocks, trials, and events all nested within a task. If you are processing a battery of tasks, we have provided documentation [HERE] on how to simultaneuosly process multiple tasks, but each task will need to have a set of unique preprocessing options stored in a config file. This vignette documents how to set up a config file for a single cognitive task, which can be easily translated over multiple tasks if you would like to use experiment.pipeline's batch processing capabilities.

Defining key variables for data processing: "definitions"

This field is where most of the action for processing an eyetracking experiment will happen. The definitions field will be grouped according to the data modality (behav, eye, phys). We will focus on eye definitions here, directions on implementing behav and phys definitions can be found [HERE] and [HERE].

################################
definitions:
  behav: &behav #shared key mapping for behavior across blocks
    response: key_pressed
    valid: [space, None]
    rt: rt
    start_time: #key_resp_10.started
    end_time: #key_resp_10.stopped
  eye: &eye
    global:
      prefix: "\\d{3}_[[:upper:]]+"
      gen_log: TRUE
      log_dir: '~/github_repos/experiment.pipeline/inst/extdata/ep_preproc/SortingMushrooms/elog'
      save_preproc: FALSE
      preproc_out: '~/github_repos/experiment.pipeline/inst/extdata/ep_preproc/SortingMushrooms/preproc'
      return_raw: TRUE
    initialize:
      expected_edf_fields: ['raw', 'sacc', 'fix', 'blinks', 'msg', 'input', 'button', 'info', 'asc_file', 'edf_file']
      unify_gaze_events:
        gaze_events: ['sacc', 'fix', 'blink']
        confirm_correspondence: FALSE
      meta_check:
        meta_vars: ['sample.rate', 'model', 'mono', 'pupil.dtype', 'screen.x', 'screen.y', 'version']
        meta_vals: ['1000', 'EyeLink 1000', 'TRUE', 'AREA', '1920', '1080', '4.594']
        recording_time: [1200, 360] # [expected time (seconds), margin of error above and below]
      inherit_btw_ev: # do certain between-trial messages need to be extracted for any reason? If left out, will skip
        calibration_check:
          cal: ["!CAL CALIBRATION HV9"]
          val: ["!CAL VALIDATION HV9"]
        move_to_within:
          str: ["!MODE RECORD CR 1000 2 1 R", "TRIALID", "END_RECORDING", "TRIAL "]
          align_msg: ["", "!MODE RECORD CR 1000 2 1 R", "TRIAL_OUTCOME", "TRIAL_OUTCOME"]
          pre_post: ["post", "pre", "post", "post"]
    msg_parse:
      extract_event_func_path: '~/github_repos/experiment.pipeline/inst/extdata/ep_configs/SortingMushrooms/gen_SortingMushrooms_eye_events.R'   # if extraction method == "function" pass path to the function here.
      csv_dir_path: '~/github_repos/experiment.pipeline/inst/extdata/ep_preproc/SortingMushrooms/eye_event_csvs' # if extraction method %in% c("csv", "function")  path to extract or write event csvs to.
      msg_seq: # &msg_seq #decided to comment this out below for the sake of simplicity.
        msg_start: ["!MODE RECORD CR 1000 2 1 R", "TRIALID", "SYNCTIME", "DISPLAY ON"]
        msg_end: [ "TRIAL_OUTCOME ", "TRIAL "]
        eval_middle: TRUE #smoosh certain event-specific (taken from below) messages in between the task-general beginning and end messages.
        ordered: TRUE
    gaze_preproc:
      aoi:
        indicator: ["!V IAREA RECTANGLE"]
        extraction_method: regex
        extract_coords: ["\\d{3,4} \\d{3,4} \\d{3,4} \\d{3,4}"]
        extract_labs: ["[a-z]+$"]
        split_coords: " "
        tag_raw: FALSE #unless there is some strong reason to need super-high resolution on AOI position (moving AOIs, which are not currently supported), this should be FALSE. Default is FALSE if not included in config.
      downsample:
        factor: 20
        method: "mean"
    pupil_preproc:
      blink_corr:
        ms_before: 100
        ms_after: 100
      filter:
        method: "movingavg" #right now only moving average supported
        window_length: 50 #n measurements to lookback while smoothing, gets passed to pracma::movavg. In ms.
      interpolate:
        algor: "spline"
        maxgap: 1000 ### in ms, will use the original sampling frequency and downsampling factor to convert to nmeasurements.
      baseline_correction:
        method: "subtract"
        dur_ms: 100
        center_on: "DISPLAY ON"
      downsample:
        factor: 50
        method: "mean"
    # qa: #coming soon!
    #   gaze:
    #     na:
    #       check: ["raw", "downsample"]
    #       perc: 30
    #       cols: ["xp", "yp"]
    #   pupil:
    #     na:
    #       check: ["downsample"]
    #       perc: 30
    #       cols: ["ps_bc"]
  phys:
################################

Overview of definitions$eye subfields

In general, ep.eye_process_subject runs a stepwise procedure taking a file.path to a raw .edf file (which comes off of the SR Research Eyelink tracker, but needs to be integrated into the ep.eye framework) and a file.path to a config .yaml file and runs a few major procedures (which are themselves broken up into many component parts). Each subfield of config$definitions$eye roughly maps onto one of six functions that performs a portion of processing an ep.eye object:

names(config$definitions$eye)

global

Global ep.eye definitions are used very early (e.g. whether or not to launch a log file in Step 1: setup processing configuration [ep.eye_setup_proc_config.R]) or very late (e.g. removing raw data and saving the preprocessed ep.eye object in Step 6: cleanup [ep.eye_cleanup.R] ) in the ep.eye processing procedure and can be setup in the config as such:

################################
definitions:
  eye: &eye
    global:
      prefix: "\\d{3}_[[:upper:]]+"
      gen_log: TRUE
      log_dir: '~/github_repos/experiment.pipeline/inst/extdata/ep_preproc/SortingMushrooms/elog'
      save_preproc: FALSE
      preproc_out: '~/github_repos/experiment.pipeline/inst/extdata/ep_preproc/SortingMushrooms/preproc'
      return_raw: TRUE
################################

and are read into the R session as:

config$definitions$eye$global

subfield descriptions and default options

Here's example of how to test if your regex works as expected:

## if prefix is specified in config:
# It is imporatant to make sure the naming structure in your directory is uniform if batch processing.
prefix_regex <- "\\d{3}_[[:upper:]]+" 
stringr::str_extract(edf_path, prefix_regex)
## default option: use basename() while removing file extension
sub(pattern = "(.*)\\..*$", replacement = "\\1", basename(edf_path))

Knowing what we know now about default options, we could rewrite these global options as:

################################
definitions:
  eye: &eye
    global:
      prefix: "\\d{3}_[[:upper:]]+"
      log_dir: '~/github_repos/experiment.pipeline/inst/extdata/ep_preproc/elog'
      preproc_out: '~/github_repos/experiment.pipeline/inst/extdata/ep_preproc/preproc'
      return_raw: TRUE 
################################

and achieve the same result since save_preproc and gen_log both default to TRUE.

initialize

Initialize ep.eye definitions are all utilized in Step 2: Initialize ep.eye object (ep.eye_initialize.R). These options configure how the .edf file is read into an ep.eye object and the initial validation and data wrangling that goes into setting up a subject to be preprocessed. Here is an example for the Sorting Mushrooms task:

################################
definitions:
  eye: &eye
    initialize:
      expected_edf_fields: ['raw', 'sacc', 'fix', 'blinks', 'msg', 'input', 'button', 'info', 'asc_file', 'edf_file']
      unify_gaze_events: 
        gaze_events: ['sacc', 'fix', 'blink']
        confirm_correspondence: FALSE
      meta_check:
        meta_vars: ['sample.rate', 'model', 'mono', 'pupil.dtype', 'screen.x', 'screen.y', 'version']
        meta_vals: ['1000', 'EyeLink 1000', 'TRUE', 'AREA', '1920', '1080', '4.594']
        recording_time: [1200, 360] 
      inherit_btw_ev: # do certain between-trial messages need to be extracted for any reason? If left out, will skip
        calibration_check:
          cal: ["!CAL CALIBRATION HV9"]
          val: ["!CAL VALIDATION HV9"]
        move_to_within:
          str: ["!MODE RECORD CR 1000 2 1 R", "TRIALID", "END_RECORDING", "TRIAL "]
          align_msg: ["", "!MODE RECORD CR 1000 2 1 R", "TRIAL_OUTCOME", "TRIAL_OUTCOME"]
          pre_post: ["post", "pre", "post", "post"]
################################

and are read into the R session as:

config$definitions$eye$initialize

subfield descriptions and default options

### If edf2asc executable has not been added to path see: https://rdrr.io/github/davebraze/FDBeye/man/edf2asc.html
edf <- read_edf(edf_path, keep_asc = FALSE, parse_all = TRUE)[[1]] 

names(edf)
## Ideally, if you have an expectation that these meta-variables should be conserved across subjects, it would be good to add this to your config file.
edf[["info"]]
edf$msg %>% filter(eventn == .5 & (grepl("CALIBRATION", text) | grepl("VALIDATION", text)))

Knowing what we know now about default initialization options, we could rewrite these options as:

################################
definitions:
  eye: &eye
    initialize:
      meta_check:
        meta_vars: ['sample.rate', 'model', 'mono', 'pupil.dtype', 'screen.x', 'screen.y', 'version']
        meta_vals: ['1000', 'EyeLink 1000', 'TRUE', 'AREA', '1920', '1080', '4.594']
        recording_time: [1200, 360] 
      inherit_btw_ev: # do certain between-trial messages need to be extracted for any reason? If left out, will skip
        calibration_check:
          cal: ["!CAL CALIBRATION HV9"]
          val: ["!CAL VALIDATION HV9"]
        move_to_within:
          str: ["!MODE RECORD CR 1000 2 1 R", "TRIALID", "END_RECORDING", "TRIAL "]
          align_msg: ["", "!MODE RECORD CR 1000 2 1 R", "TRIAL_OUTCOME", "TRIAL_OUTCOME"]
          pre_post: ["post", "pre", "post", "post"]
################################

and achieve the same result since expected_edf_fields and unify_gaze_events are specified as defaults above. Since meta_check and inherit_btw_ev default to NULL, to utilize this functionality we need to explicitly specify these in our config.

msg_parse

msg_parse definitions are all utilized in Step 3: Parse task events (ep.eye_parse_events.R). This stage of setting up your ep.eye config file is important and is probably the stage where the most user interface with the raw data is necessary. In this field, you will specify an expected message structure across events and will use one of a few methods to extract the relevant eyetracker messages which denote things such as trials starting and stopping, stimuli being presented, and subject choices. These will be added to the raw data and will eventually be downsampled and interpolated when preprocessing the ep.eye data. Thus, it is highly recommended that you use the ep.eye_msg_report() function to extract and examine the messages that get passed to the eyetracker and use this information to guide you at this step. Here is an example from the Sorting Mushrooms task:

################################
definitions:
  eye: &eye
    msg_parse:
      extract_event_func_path: '~/github_repos/experiment.pipeline/inst/extdata/ep_configs/SortingMushrooms/gen_SortingMushrooms_eye_events.R'   
      csv_dir_path: '~/github_repos/experiment.pipeline/inst/extdata/ep_preproc/SortingMushrooms/eye_event_csvs' 
      msg_seq: 
        msg_start: ["!MODE RECORD CR 1000 2 1 R", "TRIALID", "SYNCTIME", "DISPLAY ON"]
        msg_end: [ "TRIAL_OUTCOME ", "TRIAL "]
        eval_middle: TRUE #smoosh certain event-specific (taken from below) messages in between the task-general beginning and end messages.
        ordered: TRUE
################################

which is read into the R session as:

config$definitions$eye$msg_parse

subfield descriptions and default options

The most important option at this stage is the path to your "event extraction function". I've written up some documentation on how to implement this in a separate walkthrough Note that all options default to NULL, and if this field is missing entirely from the config file, ep.eye_process_subject.R will attempt to skip parsing event-related information and continue to preprocessing.

gaze_preproc

gaze_preproc definitions are all utilized in Step 4: Gaze preprocessing (ep.eye_preprocess_gaze.R). Currently, this is setup to extract areas of interest from the sequence of eyetracker messages and tag the data with information about whether certain AOIs were being looked at. Additionally, you can specify downsampling parameters for your gaze data here. Here is an example from the Sorting Mushrooms task:

################################
definitions:
  eye: &eye  
    gaze_preproc:
      aoi:
        indicator: ["!V IAREA RECTANGLE"]
        extraction_method: regex
        extract_coords: ["\\d{3,4} \\d{3,4} \\d{3,4} \\d{3,4}"]
        extract_labs: ["[a-z]+$"]
        split_coords: " "
        tag_raw: FALSE 
      downsample:
        factor: 20
        method: "mean"
################################

which is read into the R session as:

config$definitions$eye$gaze_preproc

subfield descriptions and default options

N.B All of our configuration options for the Sorting Mushrooms Task are set to defaults so we can entirely omit this portion of the config file and the preprocessing will execute the same.

pupil_preproc

pupil_preproc definitions are all utilized in Step 5: Preprocess gaze data (ep.eye_preprocess_gaze.R). Currently, pupil prerocessing includes blink corrections, smoothing/filtering, interpolation, baseline correction, and downsampling.

################################
definitions:
  eye: &eye  
    pupil_preproc:
      blink_corr:
        ms_before: 150
        ms_after: 150
      filter:
        method: "movingavg" #right now only moving average supported
        window_length: 20 #n measurements to lookback while smoothing, gets passed to pracma::movavg. In ms.
      interpolate:
        algor: "linear"
        maxgap: 1000 ### in ms, will use the original sampling frequency and downsampling factor to convert to nmeasurements.
      baseline_correction:
        method: "subtract"
        dur_ms: 100
        center_on: "DISPLAY ON"
      downsample:
        factor: 50
        method: "mean"
################################

which is read into the R session as:

config$definitions$eye$pupil_preproc

subfield descriptions and default options

N.B All of our pupil configuration options for the Sorting Mushrooms Task are set to defaults so we can entirely omit this portion of the config file and the preprocessing will execute the same.

A shortened version of eye definitions

Now that all relevant ep.eye definitions have been decribed and their defaults specified, we could shorten the overall definitions field of the configuration file to reduce confusion

################################
definitions:
  eye: &eye
    global:
      prefix: "\\d{3}_[[:upper:]]+"
      log_dir: '~/github_repos/experiment.pipeline/inst/extdata/ep_preproc/elog'
      preproc_out: '~/github_repos/experiment.pipeline/inst/extdata/ep_preproc/preproc'
      return_raw: TRUE 
    initialize:
      meta_check:
        meta_vars: ['sample.rate', 'model', 'mono', 'pupil.dtype', 'screen.x', 'screen.y', 'version']
        meta_vals: ['1000', 'EyeLink 1000', 'TRUE', 'AREA', '1920', '1080', '4.594']
        recording_time: [1200, 360] 
      inherit_btw_ev: # do certain between-trial messages need to be extracted for any reason? If left out, will skip
        calibration_check:
          cal: ["!CAL CALIBRATION HV9"]
          val: ["!CAL VALIDATION HV9"]
        move_to_within:
          str: ["!MODE RECORD CR 1000 2 1 R", "TRIALID", "END_RECORDING", "TRIAL "]
          align_msg: ["", "!MODE RECORD CR 1000 2 1 R", "TRIAL_OUTCOME", "TRIAL_OUTCOME"]
          pre_post: ["post", "pre", "post", "post"]
    msg_parse:
      extract_event_func_path: '~/github_repos/experiment.pipeline/inst/extdata/ep_configs/SortingMushrooms/gen_SortingMushrooms_eye_events.R'   
      csv_dir_path: '~/github_repos/experiment.pipeline/inst/extdata/ep_preproc/SortingMushrooms/eye_event_csvs' 
      msg_seq: 
        msg_start: ["!MODE RECORD CR 1000 2 1 R", "TRIALID", "SYNCTIME", "DISPLAY ON"]
        msg_end: [ "TRIAL_OUTCOME ", "TRIAL "]
        eval_middle: TRUE #smoosh certain event-specific (taken from below) messages in between the task-general beginning and end messages.
        ordered: TRUE
################################

Blocks

Finally, if there are block or event-specific messages in your eyetracking data that you would like to validate, they can be included in the blocks subfield:

################################
blocks:
  approach-ins:
    ntrials: [48, 72]
    behav: *behav
    events:
      shroom:
        eye:
          mid_msg: ["!V IAREA RECTANGLE 1", "!V IAREA RECTANGLE 2", "!V IAREA RECTANGLE 3", "mouse on", "DISPLAY OFF"] #these will be event-specific messages that will fall between msg_start and msg_end
      feedback:
        eye:
          mid_msg: ["!V IAREA RECTANGLE 1", "DISPLAY OFF",] 
  approach-pav:
    ntrials: 60
    behav: *behav
    events:
      fractal:
        eye:
          mid_msg: ["!V IAREA RECTANGLE 1"]
      feedback:
        eye:
          mid_msg: ["!V IAREA RECTANGLE 1", "DISPLAY OFF"]
  approach-feedback:
    ntrials: 10
    behav: *behav
    events:
      fractals:
        eye:
          mid_msg: ["!V IAREA RECTANGLE 1", "!V IAREA RECTANGLE 2", "mouse on", "DISPLAY OFF"]
  approach-pit:
    ntrials: 90
    behav: *behav
    events:
      compound:
        eye:
          mid_msg: ["!V IAREA RECTANGLE 1", "mouse on", "DISPLAY OFF"]

In this section, you may choose to specify mid_msg fields nested within blocks and events that allow for event-specific messages that ought to appear in between the event-general msg_start and msg_end sequences. If mid-msgs are specified in the blocks field, the script will automatically check that on specific event types, all msg_start, msg_end and mid_msg are included within a recording event. It is useful to add these messages to ensure that certain stimulus onset messages, choices, or other event-specific events occur within your data.

Putting it all together: Processing a single subject

All being said, once your configuration file is setup correctly, you can run all processing options on a single subject by running:

ep.eye_preproc <- ep.eye_process_subject(edf_path, config_path)

This will export a preprocessed ep.eye object that looks like this:

ep.eye_preproc

Which you can elect to pass off to QA checks and diagnostics or go ahead and start analyzing!

Happy ep.eye-ing!



PennStateDEPENdLab/experiment.pipeline documentation built on April 14, 2024, 9:37 p.m.