In tjmahr/lookr: Tools to Analyze Looking While Listening or Visual World Eyetracking Experiments

library(lookr)

Data Structures and Abstractions

In this doc, I discuss what makes up our eye-tracking data and the main levels of abstraction that we use to contain the data.

Raw eye-tracking data

Here are the files that make up a single subject's eye-tracking data. This participant received two blocks of the coarticulation experiment.

data_files <- list.files("data/Coartic_WFFArea_2a/001P00XS1/")
data_files

Each experimental block produces a .txt file and a .gazedata file. These two files contain all the pertinent experimental data for an eye-tracking block. During an experiment, Eprime also produces an .edat file, but that file is unusable outside of Eprime so we ignore it.

The .txt and .gazedata files for a block should have the same basename (i.e., the same filename except for the file extension). When we truncate the file extensions, we see that we only have two unique filenames, one for each block of data.

unique(tools::file_path_sans_ext(data_files))

Stimdata

The .txt file for a block is the Stimdata file. It contains information about each experimental trial (like stimulus presentation or event timing). Eprime generates this file, and it's not pretty. Here is a single trial, as recorded in the file.

        *** LogFrame Start ***
        TrialList: 1
        Procedure: TrialProcedure
        ImageL: BoardBook1
        ImageR: StuffedDog1
        Carrier: Where
        Target: ImageL
        Pitch: hi
        AudioStim: Whe_hi_the_V_Book_neut
        Attention: AN_LookAtThat
        AudioDur: 2190
        AttentionDur: 1500
        WordGroup: dog-book
        StimType: neutral
        TargetWord: book
        Running: TrialList
        TrialList.Cycle: 1
        TrialList.Sample: 1
        Image2sec.OnsetTime: 37347
        Image2sec.StartTime: 37324
        Fixation.OnsetDelay: 39
        Fixation.OnsetTime: 38902
        Fixation.StartTime: 38893
        Target.OnsetDelay: 0
        Target.OnsetTime: 39353
        Target.StartTime: 39289
        Wait1SecFirst.OnsetDelay: 0
        Wait1SecFirst.OnsetTime: 41543
        Wait1SecFirst.StartTime: 41543
        Attention.OnsetDelay: 3
        Attention.OnsetTime: 42546
        Attention.StartTime: 42343
        *** LogFrame End ***

The Stimdata() function extracts and massages the pertinent experimental information from the stimdata file into a dataframe. The massaging process is complex and kind of a pain, but it handles the many iterations of our eye-tracking experiments: It determines which task is being performed and with which stimulus presentation protocol, and it adjusts the timing attributes accordingly. If we design and implement a new experiment or a new version of an existing experiment, I usually edit the backend of Stimdata() function to account for the new experiment.

Anyway, in the output of Stimdata(), each row of the dataframe represents the attributes of an experimental trial.

stim_path <- "data/Coartic_WFFArea_2a/001P00XS1/Coartic_Block1_001P00XS1.txt"
stimdata <- Stimdata(stim_path)
str(stimdata)

Gazedata

The .gazedata file contains tab-delimited Gazedata from the eye-tracker for the entire block.

gaze_path <- "data/Coartic_WFFArea_2a/001P00XS1/Coartic_Block1_001P00XS1.gazedata"
raw_gaze <- read.delim(gaze_path, na.strings = c("-1.#INF", "1.#INF"), 
                       stringsAsFactors = FALSE)
str(raw_gaze)

We don't need all these columns---which are documented in another doc, btw---so Gazedata() keeps just the ones we care about. The function also computes the monocular averages for each gaze-data variable, combining available data from the left and right eyes.

gazedata <- Gazedata(gaze_path)
str(gazedata)

Sometimes only one of the eyes is tracked, so we use data from the available eye to compute the monocular average, as shown in the example below. Some researchers advise against this kind of interpolation [citation TODO].

xmean_from_right <- subset(gazedata[c("XLeft", "XRight", "XMean")], 
                           is.na(XLeft) & !is.na(XRight))
head(xmean_from_right)

Blocks and Trials

Now that we have the stimdata for each trial and the gazedata from the whole block, we can combine these two together using Block(). This function slices up the gazedata, creating a dataframe for each trial. The stimulus properties for each trial are attached to the trial as attributes. The gazedata dataframe and attached stimdata make up a Trial object. The code below shows the structure of a single trial.

block1 <- Block(gazedata, stimdata)
trial <- block1[[1]]
str(trial)

Block also accepts a character argument when it gives the basename of a block---that is, the path of gazedata file minus the .gazedata extension. This is the second rung in our ladder of convenient abstractions, as Blocks abstract away from Stimdata and Gazedata.

gaze_path2 <- "data/Coartic_WFFArea_2a/001P00XS1/Coartic_Block2_001P00XS1.gazedata"
(block_basename <- tools::file_path_sans_ext(gaze_path2))
# Load stimdata and gazedata and merge in one step
block2 <- Block(block_basename)

ToTarget

When the stimdata and gazedata are combined, six new columns are also produced. The columns all end in ToTarget and they describe the screen location of the gaze in terms of proximity to the target image. That is, plain-old XMean describes the location of the gaze such that 0 is the left side of the screen and 1 is the right side. In the trial above, the target word is on the left side of the screen, so small XMean values are closer to the left side of the screen and hence closer to the target. In XMeanToTarget, the XMean values are flipped so that greater values are closer to the target image. Essentially, the target image becomes the right image for all trials. This kind of normalization is useful if we want to look at the gaze-location with respect to the target image over several trials.

# the added columns
grep("ToTarget", names(trial), value = TRUE)
library(ggplot2)
# default plot
qplot(data = trial, x = Time, y = XMean) + labs(title = "Raw XMean value")
qplot(data = trial, x = Time, y = XMeanToTarget) + 
  labs(title = "XMean flipped towards target")

Working with attributes with `%@%`

A Block is a list of Trial objects. We can access the attributes of multiple trials using the %@% function.

block1 %@% "TargetWord"
block1 %@% "TargetImage"
block1 %@% "TargetOnset"

The attribute-infix function %@% can also be used on single trials to get and set their attribute values.

trial %@% "TargetWord"
trial %@% "TargetImage"
# Setting
trial %@% "SpecialNewAttribute" 
trial %@% "SpecialNewAttribute" <- "Hello!"
trial %@% "SpecialNewAttribute"

Here's how one can use %@% to manually adjust the timing of a Trial so that the TargetOnset occurs at 0ms. You should never have to manually do this, because the AlignTrials functions does this for you.

trial$Time <- trial$Time - (trial %@% "TargetOnset")
trial %@% "TargetOnset" <- 0
qplot(data = trial, x = Time, y = XMeanToTarget, xlim = c(-800, 1500)) + 
  labs(title = "Looking to target with adjusted time values\n(Don't ever do this manually!)")

Sessions and Tasks

We organize our data by task and then by subject. Put another way, block files are nested within subject folders within task folders, as shown in the mock file hierarchy below.

/data/
|-- Task1
|   |-- Subject1
|   |   |-- Task1_Subject1_block1.gazedata
|   |   |-- Task1_Subject1_block1.txt
|   |   |-- Task1_Subject1_block2.gazedata
|   |   |-- Task1_Subject1_block2.txt
|-- Task2
|   |-- Subject2
|   |   |-- Task2_Subject2_block1.gazedata
|   |   |-- Task2_Subject2_block1.txt
|   |   |-- Task2_Subject2_block2.gazedata
|   |   |-- Task2_Subject2_block2.txt
|   |-- Subject3
|   |   |-- Task2_Subject3_block2.gazedata
|   |   |-- Task2_Subject3_block2.txt

Our next level of abstraction is the Session which contains all the blocks in a subject directory.

session <- Session("data/Coartic_WFFArea_2a/001P00XS1/")

A Session is just a list of Trial objects; Session[[1]] is the first Trial object in the list. When we blocks are combined to form the session, the trials are renumbered. We can recover the original block number using Basename or Block attributes.

# Trial numbering of the separate blocks
block_numbering <- c(block1 %@% "TrialNo", block2 %@% "TrialNo")
data.frame(Basename = session %@% "Basename", BlockNo = session %@% "Block",
           OrigTrialNo = block_numbering, TrialNo = session %@% "TrialNo")

Task

The highest level of abstraction is the Task. It contains all the blocks for all the subjects in a task directory. Just like a Session or a Block it is just a list of Trial objects.

coartic <- Task("data/Coartic_WFFArea_2a/")
length(coartic)

Task fails gracefully

Suppose we want to load data from 50 subjects. An idiomatic R approach might be something like lapply(subject_paths, Session) where we apply the Session function to each element in a vector of subject directories. But wait, there's a problem with the data for the 47th subject! After spending a couple minutes loading the data for 46 subjects, the whole thing crashes and we lose everything. Barf! This problem happens occasionally, especially for our eye-tracking tasks involving toddlers who fuss out from time to time.

Task() fails gracefully when it encounters a bad block. Here I have simulated some bad data by truncating the data-files for a block midway through the experiment, which happens when we abort an experiment.

(blocks_to_load <- list.files("data/RWL_WFFArea_Long/", recursive = TRUE, pattern = "gazedata"))
task <- Task("data/RWL_WFFArea_Long/")
# 3 of 4 blocks loaded
unique(task %@% "Basename")