Blink and Artifact Cleanup (Detailed)

knitr::opts_chunk$set(fig.width=6, fig.height=4, warning=FALSE)
library(PupilPre)
library(ggplot2)
data(Pupilex3)
dat <- recode_off_screen(data = Pupilex3, ScreenSize = c(1920, 1080))
# This is the same as dat3 within the basic processing

This vignette contains detailed information regarding the automatic and manual cleanup of blinks and artifacts in the pupil size data. The package contains functions which are meant to provide cleaning that is reproducible and that can be done across multiple sessions.

Automatic cleanup

There are two functions (clean_blink and clean_artifact) for detecting and removing artifacts related to blinks and other eye movement. The functions, which can be used independently or sequentially, implement differential and distributional detection algorithms (explained in more detail below). The detected data points are then marked with NA. In both cases, the algorithm can be adjusted using parameters which control the scope and sensitivity of the detection. Note that the two functions can either be used in conjunction with one another sequentially, or completely independently.
The example below shows how to use them sequentially

Here will we will focus on Event 16892.8, seen in the figure below.
There is one marked blink (the first) and one un-marked artifact (the second).

# Take for example Event 16892.8 has one marked blink and one unmarked blink
pac_theme <- function(base_size = 12, base_family = ""){
  theme_bw(base_size = base_size, base_family = base_family) %+replace%
    theme(panel.grid.major.x = element_blank(),
          panel.grid.minor.x = element_blank(),
          panel.grid.major.y = element_blank(),
          panel.grid.minor.y = element_blank(),
          plot.title = element_text(hjust = 0.5, vjust = 1)
    )
}
dat %>% filter(Event %in% c("16892.8")) %>% 
  select(Event, Pupil, Time) %>%
  tidyr::gather(Column, PUPIL, -Time, -Event) %>% 
  ggplot(aes(x=Time, y=PUPIL)) + 
  geom_point(na.rm = T) +
  ylab("Pupil Dilation") +
  facet_wrap(. ~ Event) + pac_theme()

Marked blinks

The first function clean_blink operates only on blinks marked by the SR eyetracking software. It first identifies the marked blinks and adds padding around them to create a marked time window within which data may be removed.
This padding is given in BlinkPadding specifying the number of milliseconds to pad on either side of the blink. Within this padded window the data are examined in two passes.

The first pass calculates a difference in pupil size between subsequent data points. If the difference is larger than the value specified in Delta, the data point is marked for removal. Note that if Delta is not specified, the function will attempt to estimate a reasonable value based on the 95th percentile of differences in the data.

The second pass attempts to identify remaining data points or islands of data points (small runs surrounded by NAs) which remain in the window. This is done using MaxValueRun and NAsAroundRun. MaxValueRun controls the longest run (i.e., sequence) of data points flanked by NAs that could be marked for removal NAsAroundRun controls the number of NAs required on either side of the run in order for it to be removed. The argument LogFile specifies the path and name of the file into which information will be saved about cleaning status. Please note, for the purposes of this vignette we are saving the file into the vignette's temporary folder; however, users will likely want to save the file into their working directory by simply specifying LogFile = "BlinkCleanupLog.rds".

datblink <- clean_blink(dat, BlinkPadding = c(100, 100), Delta = 5,
                    MaxValueRun = 5, NAsAroundRun = c(2,2),
                    LogFile = paste0(tempdir(),"/BlinkCleanupLog.rds"))

Looking again at Event 16892.8, we can see that the function clean_blink successfully cleaned the marked blink using the default values. The removed data points are marked in red.

# The function successfully cleaned the marked blink
compareNA <- function(v1,v2) {
  same <- (v1 == v2) | (is.na(v1) & is.na(v2))
  same[is.na(same)] <- FALSE
  return(same)
}
pac_theme <- function(base_size = 12, base_family = ""){
  theme_bw(base_size = base_size, base_family = base_family) %+replace%
    theme(panel.grid.major.x = element_blank(),
          panel.grid.minor.x = element_blank(),
          panel.grid.major.y = element_blank(),
          panel.grid.minor.y = element_blank(),
          plot.title = element_text(hjust = 0.5, vjust = 1)
    )
}
datblink %>% filter(Event %in% c("16892.8")) %>% 
  mutate(Compared = !(compareNA(Pupil_Previous, Pupil))) %>% 
  select(Event, Pupil, Pupil_Previous, Time, Compared) %>%
  tidyr::gather(Column, PUPIL, -Time, -Event, -Compared) %>% 
  mutate(Datapoint = ifelse(Compared==F, "Same", "Different")) %>% 
  ggplot(aes(x=Time, y=PUPIL, colour = Datapoint)) + 
  geom_point(na.rm = T) +
  scale_color_manual(values=c("Different" = "red", "Same" = "black")) +
  ylab("Pupil Dilation") +
  facet_wrap(. ~ Event) + pac_theme()

Verifying the cleanup

Because clean_blink is an automatic cleaning function, users may want to verify the effect of the cleaning. This may be helpful in determining if cleaning was effective, or to selective revert events for which cleaning was overly aggressive. The function verify_cleanup_app opens and interactive app and loads the desired log file. The user can quickly scan through the events, and if desired, can click the Revert Event Cleanup button to revert the event back to its most recent previous state (prior to running the cleanup function). Upon clicking the revert button, the status of the event in the log file will be changed and the file will be rewritten into the working directory.

verify_cleanup_app(datblink, LogFile = paste0(tempdir(),"/BlinkCleanupLog.rds"))
Verify Cleanup App

Importantly, this app only modifies the entry in the log file, not the pupil size data in the data frame. This is to provide both a record of changes as well as control over the processing. In order to carry out any modifications to the cleanup, it is necessary to use the function apply_cleanup_change. As the changes are based solely on the log file, the specific filename must be provided.

datblink <- apply_cleanup_change(datblink, LogFile = paste0(tempdir(),"/BlinkCleanupLog.rds"))

Unmarked artifacts

Not all artifacts related to blinks are automatically detected by the SR eyetracking software, see figure above. To detect these unmarked blinks and other artifacts, a second function, clean_artifact, is provided.
It implements a distributional method (described in more detail below) to detect potentially extreme data points.

This algorithm first divides the times series into windows, the size of which is specified in milliseconds using MADWindow. Within each window the median absolute deviation (MAD) of the pupil size data is calculated. This is used to detect which windows contain extreme variability (potentially containing outliers). This is determined based on the value provided in MADConstant, which controls the sensitivity threshold. The higher the constant the more extreme value is needed to trigger cleaning.

Next the identified extreme windows have padding added around them using MADPadding (again in milliseconds). Within this padded window, a multidimensional distributional distance (specifically Mahalanobis distance) is calculated. This distance can be calculated using one of two methods: Basic or Robust.

The Basic method uses the standard Mahalanobis distance and the Robust uses a robust version of the Mahalanobis distance. The latter is based on Minimum Covariance Determinant (as implemented in the package robustbase), which uses a sampling method for determining multivariate location and scatter. Both the basic and robust calculations are based on multiple variables covarying with pupil size. By default, the calcuation uses the following columns: Pupil, Velocity_Y, and Acceleration_Y. However, the parameter XandY can be set to TRUE in which case the calculation will additionally include the X-axis: Velocity_X and Acceleration_X. This is intended to capture potential horizontal eye movement affecting pupil size. N.B., missing values are automatically excluded from the distance calculation.

The function will inform the user if a particular window is skipped as there are safeguards built in which will skip a given window if: 1) there are not enough data points or 2) there are not enough columns with non-zero data to estimate covariance. To determine whether a given pupil size is extreme, the argument MahaConstant is used to set the sensitivity. The default value of the parameter is 2 (standard deviations). The higher the constant, the more extreme value of the parameter is needed to trigger cleaning.

Lastly, this function can optionally perform a second pass (setting Second to TRUE), which is identical to the second pass in clean_blink. This attempts to identify remaining data points or islands of data points (small runs surrounded by NAs) which remain.
The arguments MaxValueRun and NAsAroundRun are identical in function and meaning. As with clean_blink, the argument LogFile specifies the path and name of the file into which information about the cleaning status is written. Again, please note that for the purposes of this vignette we are saving the file into the vignette's temporary folder; however, users will likely want to save the file into their working directory by simply specifying LogFile = "ArtifactCleanupLog.rds".

datart <- clean_artifact(datblink, MADWindow = 100, MADConstant = 2,
                      MADPadding = c(200, 200), MahaConstant = 2,
                      Method = "Robust", XandY = TRUE, Second = T, 
                      MaxValueRun = 5, NAsAroundRun = c(2,2),
                      LogFile = paste0(tempdir(),"/ArtifactCleanupLog.rds"))

Looking, yet again, at Event 16892.8, we can see that the function clean_artifact successfully detected and partially cleaned the un-marked artifact, using the default values. Below, we will describe in detail how to manually clean the remainder of the artifact using the functionality provided in the package.

# The function partially cleaned the unmarked blink using default settings
compareNA <- function(v1,v2) {
  same <- (v1 == v2) | (is.na(v1) & is.na(v2))
  same[is.na(same)] <- FALSE
  return(same)
}
pac_theme <- function(base_size = 12, base_family = ""){
  theme_bw(base_size = base_size, base_family = base_family) %+replace%
    theme(panel.grid.major.x = element_blank(),
          panel.grid.minor.x = element_blank(),
          panel.grid.major.y = element_blank(),
          panel.grid.minor.y = element_blank(),
          plot.title = element_text(hjust = 0.5, vjust = 1)
    )
}
datart %>% filter(Event %in% c("16892.8")) %>% 
  mutate(Compared = !(compareNA(Pupil_Previous, Pupil))) %>% 
  select(Event, Pupil, Pupil_Previous, Time, Compared) %>%
  tidyr::gather(Column, PUPIL, -Time, -Event, -Compared) %>% 
  mutate(Datapoint = ifelse(Compared==F, "Same", "Different")) %>% 
  ggplot(aes(x=Time, y=PUPIL, colour = Datapoint)) + 
  geom_point(na.rm = T) +
  scale_color_manual(values=c("Different" = "red", "Same" = "black")) +
  ylab("Pupil Dilation") +
  facet_wrap(. ~ Event) + pac_theme()

Verifying the cleanup, part II

Again, because clean_artifact is an automatic cleaning function, users may want to verify the effect of the cleaning. It is possible that this automated cleaning procedure, depending on the parameters specified, can remove more or less data points that appear to be "good". This is part and parcel of automatic detection and cleaning. However, the algorithm is designed to detect extreme values based on the data in as targeted a way as possible. Based on our experience and testing, we have set default values which perform well in most scenarios.

The function verify_cleanup_app can be used to scan through the events, and if desired, revert an event back to its most recent previous state (in this case to the state of the data in data frame dat4a, i.e., prior to having run clean_artifact, but after having run clean_blink).

verify_cleanup_app(datart, LogFile = paste0(tempdir(),"/ArtifactCleanupLog.rds"))
Verify Cleanup App

Again, any changes made using verify_cleanup_app must be applied to the data using apply_cleanup_change and specifying "ArtifactCleanupLog" as the log file.

datart <- apply_cleanup_change(datart, LogFile = paste0(tempdir(),"/ArtifactCleanupLog.rds"))

Further evaluation of the cleanup

In order to help evaluate the results of the cleanup, we provide a data visualization tool that explicitly displays the difference in the pupil data before and after carrying out the automatic cleanup. The function plot_compare_app opens an interactive Shiny app for viewing the results of the cleanup. It plots each event and shows which data points are now different. Additionally, it states how much of the data (as a percentage) is missing, i.e., was removed by the cleaning procedure.

plot_compare_app(datart)
Plot Compare App

Alternatively, the function compare_summary produces a summary output of the comparison by Event. The data can be returned by setting the argument ReturnData = TRUE.

compare_summary(datart)

Manual cleanup

Automatic cleanup may not capture and clean all artifacts. Thus a manual cleanup function (user_cleanup_app) is provided. This function opens an interactive Shiny app for viewing Events and specifying which data points to remove. Data can be removed either by specifying a point in time (i.e., removing one specific data point) or a range of time (i.e., removing a sequence of data points), or any combination of these. For example, type in 1550 to remove a data point at time 1550 ms; type in 1600:1700 to remove all data points between 1600 ms and 1700 ms (inclusive); or type in 1550, 1600:1700 to remove the data point at 1550 ms and all data points between 1600 ms and 1700 ms (inclusive).

The user-specified data points are saved into a log file. The path and filename are specified in LogFile. Again, please note that for the purposes of this vignette we are saving the file into the vignette's temporary folder; however, users will likely want to save the file into their working directory by simply specifying LogFile = "UserCleanupLog.rds".

This allows the user to clean part of the data in one session, and return to cleaning at later point. The function will read the log file from the working directory the next time the app is opened. Additionally, this log file ensures that the manual preprocessing step can be repeated if necessary as long as the log file exists. In the example below we will finish cleaning Event 16892.8.

user_cleanup_app(datart, LogFile = paste0(tempdir(),"/UserCleanupLog.rds"))
User Cleanup App

Here is a brief example of how the Shiny app works.

  1. Open the app specifying the data set you would like to clean.
  2. Navigate to a specific event using the dropdown menu or the "Previous"/"Next"" buttons.
  3. Once, the event is selected you will see it plotted in left-most panel ("Original").
  4. In the Time points box input the data points that require cleaning. In this example, we would like to remove all data points between 1835 ms and 1995 ms, so we will enter a range of values 1835:1995. The selected data points will be highlighted in red in the middle panel ("Preview").
  5. When you are satified with the result, press the "Commit Current Event Cleanup" button to save the cleanup information to the log file.
  6. If you make a mistake or would like to re-do a particular event cleanup, you can press the "Reset Current Event Cleanup" button.
  7. When cleaning is complete, press the "Exit and Save" button.

Note that, while the example above uses the cleanup app to further clean the data already processed with the automatic cleaning functions, the app can be used completely independently (i.e., without first doing automatic cleanup) if the user wishes to manually clean all events.

Once manual cleanup is done, the user must apply the cleanup to the data based on the contents of the log file. This is done with the function apply_user_cleanup. Note that it is also possible to visualize the results of the applied manual cleanup using plot_compare_app.

UserCleanupLog <- vector("list", length = length(unique(datart$Event)))
names(UserCleanupLog) <- unique(datart$Event)
UserCleanupLog[1:length(UserCleanupLog)] <- NA
UserCleanupLog[["16892.8"]] <- c(1835:1995)
saveRDS(UserCleanupLog, file = paste0(tempdir(),"/UserCleanupLog.rds"))
datclean <- apply_user_cleanup(datart, LogFile = "UserCleanupLog.rds")
saveRDS(datclean, file = "Partial_datclean.rds", compress = "xz")
datclean <- apply_user_cleanup(datart, LogFile = paste0(tempdir(),"/UserCleanupLog.rds"))

Looking, one last time, at Event 16892.8, we can see that both the marked blink and the un-marked artifact are now both fully cleaned.

# The event after automatic and manual cleaning
#datclean <- readRDS(file = "Partial_datclean.rds")
pac_theme <- function(base_size = 12, base_family = ""){
  theme_bw(base_size = base_size, base_family = base_family) %+replace%
    theme(panel.grid.major.x = element_blank(),
          panel.grid.minor.x = element_blank(),
          panel.grid.major.y = element_blank(),
          panel.grid.minor.y = element_blank(),
          plot.title = element_text(hjust = 0.5, vjust = 1)
    )
}
datclean %>% filter(Event %in% c("16892.8")) %>% 
  select(Event, Pupil, Time) %>%
  tidyr::gather(Column, PUPIL, -Time, -Event) %>% 
  ggplot(aes(x=Time, y=PUPIL)) + 
  geom_point(na.rm = T) +
  ylab("Pupil Dilation") +
  facet_wrap(. ~ Event) + pac_theme()

Proceed with preprocessing

At this point it is possible to proceed with preprocessing as usual. Please refer back to the Basic Preprocessing vignette and continue by removing events with sparse data.



Try the PupilPre package in your browser

Any scripts or data that you put into this service are public.

PupilPre documentation built on March 14, 2020, 1:08 a.m.