In jashu/itrak: Tools for processing eye-tracking data

This vignette provides a walk-through of how to clean a pupil-measurement time series using itrak and tidyverse. Each section describes the use of a different itrak function. It is important that these functions be applied in the same sequence given here.

library(itrak); library(tidyverse)

`pupil_data`

The sample pupil_data included with the itrak package contains 9 experimental trials + 3 drift checks collected from an EyeLink 1000 Plus, sampling pupil area from a single eye at a frequency of 500 Hz. Each experimental trial includes a 2-second baseline period, followed by the presentation of an affective image from the International Affective Picture System (IAPS) for 6 seconds. pupil_data contains the following columns:

"RECORDING_SESSION_LABEL": a unique index that identifies a single recording session. This will be renamed to id for short.

pupil_data <- rename(pupil_data, id = RECORDING_SESSION_LABEL)

"SAMPLE_INDEX": an integer that numbers the samples of each trial from 1 to the total number of samples per trial. It should reset to 1 with each new trial.
"SAMPLE_MESSAGE": a column containing a "stamp" indicating the onset of the experimental event for each trial, e.g., the transition from the baseline fixation period to the presentation of an image meant to evoke a pupillary response. In this data set, this event is given by the message "IAPSOnset".
"TIMESTAMP": a numeric vector corresponding to the clock time at which each sample was recorded.
"PUPIL_SIZE": a numeric vector containing pupil measurements. If you have a binocular eye tracking system, you may have separate vectors for the left and right pupils. The attached sample data contains a "LEFT_PUPIL_SIZE" and "RIGHT_PUPIL_SIZE" variable.

`get_freq`

After importing your data file into R (e.g., by generating a tab-delimited text file from your eye-tracking software and reading this into R using the read_tsv function), first make note of your sampling frequency. If you know what this is (the sampling frequency of the sample data set was 500 Hz), you can set it explicitly. If you don't know what it is, the get_freq function can estimate it from your TIMESTAMP variable:

samp_freq <- get_freq(pupil_data$TIMESTAMP)
samp_freq

`merge_eyes`

Optionally, use the merge_eyes function if you have a binocular eye tracker and wish to merge readings from the left and right eyes into a single time series. Following this, you can drop the LEFT_PUPIL_SIZE and RIGHT_PUPIL_SIZE variables.

pupil_data <- pupil_data %>%
  mutate(pupil = merge_eyes(LEFT_PUPIL_SIZE, RIGHT_PUPIL_SIZE)) %>%
  select(-LEFT_PUPIL_SIZE, -RIGHT_PUPIL_SIZE)

`label_trial`

Create trial numbers using label_trial. Following this, you can drop the SAMPLE_INDEX variable. Optionally, use the marker and event parameters to identify samples belonging to valid trials (and to ignore samples corresponding to drift checks or other extra-trial events); invalid readings will receive a trial label of NA, and these rows must be removed from the data frame before proceeding.

pupil_data <- pupil_data %>%
  mutate(trial = label_trial(sample = SAMPLE_INDEX,
                             marker = SAMPLE_MESSAGE,
                             event = "IAPSOnset")) %>%
  filter(!is.na(trial)) %>%
  select(-SAMPLE_INDEX)

Verify that you have the correct number of trials. In this sample data set, there should be 9 trials.

n_trials <- max(pupil_data$trial)
n_trials

`zero_onset`

Identify when each trial starts using the function zero_onset. This will set time equal to 0 wherever the marker "IAPS_Onset" appears. Be sure to group_by(trial) before calling this function. You no longer need the SAMPLE_MESSAGE and TIMESTAMP columns after this step, and they can be removed if you wish.

pupil_data <- pupil_data %>%
  group_by(trial) %>%
  mutate(time = zero_onset(time = TIMESTAMP,
                           marker = SAMPLE_MESSAGE,
                           event = "IAPSOnset")) %>%
  ungroup() %>%
  select(-SAMPLE_MESSAGE, -TIMESTAMP)

`clip_trials`

The number of pupil readings from each trial must be the same, and there is usually some variability in this number. (There may be slightly more or less samples for some trials.) It's important to have time samples for participants all be the same length for a couple of reasons: 1) if there are fewer than the standard number of samples and trials, then missing samples/trials need to be explicitly coded as such for the subsequent artifact cleaning to work properly (i.e., missing time points at the beginning and end of trials need to be treated as artifacts); 2) the data cleaning algorithms were engineered for computational speed on the premise that the data set can be parallelized into equal-length vectors--they will outright fail if this assumption isn't met.

For this sample data set, there is a baseline period of roughly 2 seconds before the trial starts, and the trial should stop after another 6 seconds. Ultimately, we will only use the last 100 ms of the baseline period to normalize each time series. However, for the initial artifact-cleaning, we will clip these trials to be between -1 sec and 6 sec. We want this extra buffer of baseline period in case the participant happens to blink during the 100 ms prior to trial onset, so that we can use the preceding data points to interpolate over the blink and still have a valid baseline reference to use for normalization. Use the function clip_trials to clip the trials to the specified length and to verify that all trials reach the specified length. It is possible that a trial may have started late or ended early, in which case clip_trials will issue a warning about how many samples are missing. You will need to then turn these implicit missing samples into explicit missing values (NAs). tidyr::complete can be used to accomplish this.

pupil_data <- pupil_data %>% 
  clip_trials(trial = trial,
              time = time,
              start = -1000,
              stop = 6000)
# The following is only necessary if you are missing one or more samples, but
# it can be safely run regardless. (It will leave the data unchanged if there
# is nothing to fix.)
pupil_data <- pupil_data %>% 
  complete(nesting(id, trial), time) %>%
  arrange(id, trial, time)

At this point, you might consider converting the time units from milliseconds to seconds. Seconds is the more meaningful time scale, and will make the x-axes of subsequent artifact plots more readable.

pupil_data$time <- pupil_data$time / 1000

`get_artifacts`

Identify which samples have artifacts using get_artifacts and store the results in a new variable called artifact. Be sure to group_by trial before calling this function, and also id if you are cleaning more than one recording session at a time.

pupil_data <- pupil_data %>%
  group_by(trial) %>%
  mutate(artifact = get_artifacts(ts = pupil, samp_freq = 500)) %>%
  ungroup()

`get_oor`

Identify which samples exceed drift limits using get_oor and store results in a new variable called oor. The recommended lim argument for pupil area in a psychological experiment (where changes in pupil dilation are evoked by emotional or cognitive stimuli) is lim = c(-0.75, 1.25). (See the help file for this function for more advice on setting this parameter.) The recommended period to use as a baseline reference is the 100 ms preceding stimulus onset. Since we have zeroed the variable time to correspond to stimulus onset and have converted the units to seconds, the baseline period corresponds to readings between -0.1 and 0 seconds.

pupil_data <- pupil_data %>%
  group_by(trial) %>%
  mutate(oor = get_oor(ts = pupil, samp_freq = 500,
                       lim = c(-0.75, 1.25), 
                       baseline = between(time, -.1, 0),
                       artifacts = artifact)) %>%
  ungroup()

`plot_artifacts`

Inspect plots of the algorithm's performance and adjust artifact sensitivity settings as needed.

plot_artifacts(data = pupil_data,
               time = time,
               measure = pupil,
               artifacts = artifact,
               oor = oor,
               trial = trial) +
  xlab("time (s)") +
  ylab("pupil area")

Adjust sensitivity of artifact detection with `max_velocity` argument

In the above plot, the red bands indicate regions of the time series that have been identified as signal artifacts. If there appear to be too many false positives (red bands covering periods of valid signal), try increasing the max_velocity setting in the call to get_artifacts. (The default is 0.9.) If there appear to be too many false negatives (signal spikes not being labeled in red), try decreasing the max_velocity setting.

Calibrate the identification of out-of-range periods with `lim` argument

The blue bands indicate regions of the time series that exceed the limits placed on relative change from baseline, as given by the lim argument in the call to get_oor. In Trial 9 above, this region reflects more than a 75% decrease in pupil area from the start of the trial, which exceeds the amount of change that a psychological stimulus can plausibly evoke. If you feel that valid signal periods are being identified as out-of-range, try widening the lim range; conversely, narrow the lim range if there is an implausible amount of change that is not being identified as out-of-range. Once you decide on a good lim setting, be sure to apply the same setting to all of your time series, and use it when you apply the fix_artifacts function (see next step).

Control merging and buffering of artifacts with `min_cont` argument

The min_cont argument (short for "minimum continuity") is used by both the get_artifacts and get_oor functions. This is used to define the minimum amount of good data that must exist between two artifact periods in order for them not to be merged into a single artifact period, and it also controls the amount of buffer added to either side of the artifact to insure that transition periods between artifactual and valid data are included as part of the artifact. The default setting here is 0.2 seconds, which is approximately the smallest window of time in which a measurable change in pupil diameter can take place.

`fix_artifacts`

Remove the artifacts using fix_artifacts and store the results in a new variable called pupil_corrected. Use the same lim and baseline settings that were used for the get_oor function call above.

pupil_data <- pupil_data %>%
  group_by(trial) %>%
  mutate(pupil_corrected = fix_artifacts(ts = pupil,
                                         samp_freq = 500,
                                         lim = c(-0.75, 1.25),
                                         baseline = between(time, -.1, 0),
                                         artifacts = artifact)) %>%
  ungroup()

`plot_comparison`

Compare the original vs. the cleaned time series using plot_comparison, and make sure you are satisfied with the performance of the cleaning functions.

plot_comparison(data = pupil_data,
                time = time,
                pre = pupil,
                post = pupil_corrected,
                trial = trial)

Note the trials that are being rejected for interpolation (the ones without a green curve). Let's "zoom in" on those by combining a dplyr::filter to plot_artifacts.

bad_trials <- pupil_data %>% filter(is.na(pupil_corrected)) %>%
  select(trial) %>% distinct %>% unlist
pupil_data %>% filter(trial %in% bad_trials) %>%
  plot_artifacts(time, pupil, artifact, oor, trial)

These trials are being dropped because get_artifacts identified artifact periods longer than 1 second (the max_gap default setting). You of course must decide on whether the default max_gap and lim settings are reasonable for your experiment, and, if not, where to draw these lines.

`normalize`

Normalize each pupil_corrected time series by centering and scaling to the first 100 ms prior to the onset of the IAPS images. After the time series has been normalized, discard the baseline period. (Your final time series should start at 0.)

This is done to correct for differences in intercept between participants and between trials within the same participant. Our assumptions here are that the first 100 ms still reflects dilation to the grey screen presented prior to the IAPS image, and this value should be fixed to a constant both within and between subjects because between-subject differences in dilation to the fixation screen likely reflect nuisance variables--e.g., differences in eye size, light reflex, or other irrelevant factors that are not the focus of the experiment--and within-subject differences over time in dilation to the fixation screen likely reflect measurement noise--e.g., equipment imprecision, random drift in the pupil response over time unrelated to the experiment, or carryover effects from previous trials.

pupil_data <- filter(pupil_data, time >= -0.1) %>%
  group_by(trial) %>% 
  mutate(pupil_normed = normalize(ts = pupil_corrected,
                                  baseline = time < 0)) %>%
  filter(time >= 0)

Note that if plot_comparison is used to compare a pre-normed vs. post-normed time series, the difference in scale is automatically detected and the pre time series is rescaled to match the scale of the post time series (to a close approximation). In this case, if there is no post time series (e.g., fix_artifacts returned the time series as missing data because the pre time series contained too many artifacts), the pre time series will not be plotted.

plot_comparison(data = pupil_data,
                time = time,
                pre = pupil,
                post = pupil_normed,
                trial = trial)

Note that all time series now start at 0 and the units reflect relative change from time 0.

`low_pass_filter`

Smooth the pupil_normed time series using a 4 Hz low-pass filter. You can change the frequency of the filter with the optional filter_freq argument. (4 Hz is the default.)

pupil_data <- pupil_data %>%
  group_by(trial) %>% 
  mutate(pupil_smoothed = low_pass_filter(ts = pupil_normed,
                                          samp_freq = 500)) %>%
  ungroup()

To visualize what this is doing to the data, apply plot_comparison to the pre-smoothed vs. post-smoothed data. You may need to zoom in on a single time series (this example selects Trial 1) to see the change in smoothness because plotting multiple trials in the same graph may lower the resolution and make the unfiltered curves look smoother than they actually are.

pupil_data %>% filter(trial == 1) %>%
  plot_comparison(time = time,
                  pre = pupil_normed,
                  post = pupil_smoothed,
                  trial = trial)

jashu/itrak documentation built on May 9, 2020, 1:57 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jashu/itrak
Tools for processing eye-tracking data

In jashu/itrak: Tools for processing eye-tracking data

`pupil_data`

`get_freq`

`merge_eyes`

`label_trial`

`zero_onset`

`clip_trials`

`get_artifacts`

`get_oor`

`plot_artifacts`

Adjust sensitivity of artifact detection with `max_velocity` argument

Calibrate the identification of out-of-range periods with `lim` argument

Control merging and buffering of artifacts with `min_cont` argument

`fix_artifacts`

`plot_comparison`

`normalize`

`low_pass_filter`

R Package Documentation

Browse R Packages

We want your feedback!

jashu/itrak Tools for processing eye-tracking data

In jashu/itrak: Tools for processing eye-tracking data

pupil_data

get_freq

merge_eyes

label_trial

zero_onset

clip_trials

get_artifacts

get_oor

plot_artifacts

Adjust sensitivity of artifact detection with max_velocity argument

Calibrate the identification of out-of-range periods with lim argument

Control merging and buffering of artifacts with min_cont argument

fix_artifacts

plot_comparison

normalize

low_pass_filter

R Package Documentation

Browse R Packages

We want your feedback!

jashu/itrak
Tools for processing eye-tracking data

`pupil_data`

`get_freq`

`merge_eyes`

`label_trial`

`zero_onset`

`clip_trials`

`get_artifacts`

`get_oor`

`plot_artifacts`

Adjust sensitivity of artifact detection with `max_velocity` argument

Calibrate the identification of out-of-range periods with `lim` argument

Control merging and buffering of artifacts with `min_cont` argument

`fix_artifacts`

`plot_comparison`

`normalize`

`low_pass_filter`