knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "fig/README-" )
The goal of fillgaze is to provide helper functions for interpolating missing eyetracking data.
You can install fillgaze from github with:
# install.packages("devtools") devtools::install_github("tjmahr/fillgaze")
This package was created in response to a very strange file of eyetracking data.
df <- readr::read_csv("inst/test-gaze.csv")
Here is the problem with this file:
library(dplyr, warn.conflicts = FALSE) library(ggplot2) ggplot(head(df, 40)) + aes(x = Time - min(Time)) + geom_hline(yintercept = 0, size = 2, color = "white") + geom_point(aes(y = GazeX, color = "GazeX")) + geom_point(aes(y = GazeY, color = "GazeY")) + labs(x = "Time (ms)", y = "Screen location (pixels)", color = "Variable")
Every second or third point is incorrectly placed offscreen, indicated by a negative pixel values for the gaze locations. It is physiologically impossible for a person's gaze to oscillate so quickly and with such magnitude (the gaze is tracked on a large screen display).
We would like to interpolate spans of missing data using neighboring points. That's the point of this package. The steps to solve the problem involve:
NA
values.NA
s).NA
We need to mark offscreen points as properly missing data.
set_values_to_na()
takes a dataframe and named filtering predicates.
Here's the basic usage.
set_values_to_na(dataframe, {col_name} = {function to determine NA values})
The values that return TRUE
for each function are replaced with NA
values.
For example, set_values_to_na(df, var1 = ~ .x < 0)
would:
var1
in the dataframe, .x < 0
are true where .x
is a placeholder/pronoun
for the values in df$var1
,TRUE
with NA
.library(fillgaze) original_df <- df df <- df %>% set_values_to_na( GazeX = ~ .x < -100, GazeY = ~ .x < -100, LEyeCoordX = ~ .x < -.1, LEyeCoordY = ~ .x < -.1, REyeCoordX = ~ .x < -.1, REyeCoordY = ~ .x < -.1) # Before and after on some of the GazeX values data_frame(before = head(original_df$GazeX), after = head(df$GazeX))
Now, those offscreen points will not be plotted because they are NA
.
last_plot() %+% head(df, 40)
We can use find_gaze_gaps()
to locate the gaps in a column of data. This
function mostly is used internally. Users are not expectedly to routinely use
this function, but I cover it here because the function for filling gaps relies
on the data in this dataframe.
find_gaze_gaps(df, GazeX) %>% print(width = 120)
Each row describes a gap in the column.
start_row
and end_row
contain row numbers of nearest non-NA
values. na_rows
is the number of successive NA
s in the gap.start_value
and end_value
contain the nearest non-NA
values. change_value
is the difference between start_value
and end_value
.The function also measure the duration of the gap (change_time
). By default,
it uses row numbers (.rowid
) to measure duration. We can use an explicit
column to use as the measure of time.
find_gaze_gaps(df, GazeX, time_var = Time) %>% print(width = 120)
The function also respects dplyr grouping, so that e.g., false gaps are not found between trials.
df %>% group_by(Trial) %>% find_gaze_gaps(GazeX)
fill_gaze_gaps()
will fill in the gaps in selected columns. We can set limits
on which gaps are filled:
max_na_rows
: don't fill gaps with more than successive NA
rows than max_na_rows
max_duration
: don't fill gaps with a duration larger than max_duration
max_sd
: don't fill gaps where the relative change in the variable is more than max_sd
standard deviations in magnitudedf <- df %>% group_by(Trial) %>% fill_gaze_gaps(GazeX, time_var = Time, max_na_rows = 5)
In this example, only GazeX
has been interpolated. The median value is used.
We can compare the results with the GazeY
column.
df %>% select(Time:GazeY)
fill_gaze_gaps()
also works with the variable selection helpers from dplyr/tidyselect.
df <- df %>% fill_gaze_gaps(GazeX, GazeY, matches("EyeCoord"), time_var = Time, max_na_rows = 5, max_sd = 2) %>% ungroup() df
was_offscreen <- (original_df$GazeX < -100) %>% ifelse("Interpolated", "Original\nRaw Data") last_plot() %+% head(df, 40) + aes(alpha = head(was_offscreen, 40), shape = head(was_offscreen, 40)) + scale_alpha_discrete(name = "Point", range = c(.3, 1)) + labs(shape = "Point")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.