artifacts: Time Series Artifact Detection and Correction
In jashu/itrak: Tools for processing eye-tracking data

Description Usage Arguments Details Value Advice on setting lim argument Warning Disclaimer See Also

get_artifacts and fix_artifacts identify and repair, respectively, signal artifacts within a time series. They are designed to work with pupil measurements obtained from an eye-tracking system, primarily to identify and interpolate over eye blinks, but also any other type of signal artifact that can appear when continuously measuring pupil diameter or area over time.

get_artifacts(ts, samp_freq, min_cont = 0.2, max_velocity = 0.9)

get_oor(
  ts,
  samp_freq,
  lim,
  ...,
  min_cont = 0.2,
  baseline = NULL,
  artifacts = NULL
)

fix_artifacts(
  ts,
  samp_freq,
  lim = NULL,
  baseline = NULL,
  artifacts = NULL,
  ...,
  min_cont = 0.2,
  max_gap = 1,
  max_loss = 0.5
)

get_artifacts(ts, samp_freq, min_cont = 0.2, max_velocity = 0.9)

get_oor(
  ts,
  samp_freq,
  lim,
  ...,
  min_cont = 0.2,
  baseline = NULL,
  artifacts = NULL
)

fix_artifacts(
  ts,
  samp_freq,
  lim = NULL,
  baseline = NULL,
  artifacts = NULL,
  ...,
  min_cont = 0.2,
  max_gap = 1,
  max_loss = 0.5
)

`ts`	A time series, passed as a numeric vector of chronologically ordered, positively valued observations separated by equal intervals of time.
`samp_freq`	Sampling frequency in Hz.
`min_cont`	Minimum continuity in seconds required between artifacts. If the period of "good" data between two artifacts is less than this threshold, the two artifacts are merged into one. Default value is 0.2 (200 ms).
`max_velocity`	Maximum allowable velocity specified in terms of a quantile of the distribution of absolute values of first-order differences of the time series. Velocities that exceed the value of this quantile will be used to identify onset/offset of artifacts. Default value is 0.9 (90th percentile). Lower values will lead to greater sensitivity but less specificity in identifying signal spikes. Higher values will lead to less sensitivity but greater specificity.
`lim`	Two-item numeric vector `c(neg, pos)` specifying the negative and positive limits, respectively, of relative change from baseline that is plausible for your time series. (See section "Advice on setting `lim` argument" for more information and examples.)
`...`	further arguments passed to `get_artifacts` from `get_oor` or `fix_artifacts` if it has not already been used to create an `artifacts` vector.
`baseline`	Logical vector indicating which parts of the time series correspond to the baseline period. This is used as a reference for determining whether values exceed the limits of relative change set in the `lim` argument. If there is no `baseline` provided, the mean of the entire series (excluding signal loss) will be used for this reference.
`artifacts`	A logical vector of equal length to the time series that provides logical indexing into which entries of the time series correspond to artifacts. This can be obtained by first calling `get_artifacts`. If not supplied by the user, then `fix_artifacts` will call `get_artifacts` itself.
`max_gap`	Maximum gap in seconds that an artifact period is allowed to span. Default value is 1 second, meaning if more than half of the time series consists of artifacts, `fix_artifacts` will not perform interpolation/extrapolation and instead will return a vector of all `NA`s.
`max_loss`	Maximum fraction of time series that is allowed to contain dropped signal and/or artifacts. Default value is 0.5, meaning if more than half of the time series consists of artifacts, `fix_artifacts` will not perform interpolation/extrapolation and instead will return a vector of all `NA`s.

Like similar algorithms, get_artifacts relies primarily on abrupt jumps in signal to identify the onset and offset of blinks. Unlike other algorithms, it does not require the user to prespecify threshold values that define onsets and offsets for all time series; rather, it adaptively determines the best values based on the distribution of the first-order differences (velocities) of each time series. This procedure was designed to mimic the relativistic way a human observer would visually identify an artifact, i.e., by assessing the pattern of deviation for the candidate artifact relative to that of its nearest neighbors.

Using the output of get_artifacts, fix_artifacts repairs artifacts using a sequence of linear interpolation for internal artifacts (artifacts sandwiched between good data) followed by matched lag-1 differences for external artifacts (artifacts at the start or end of a time series).

get_artifacts returns a logical vector that can be used for logical indexing into the time series to identify data artifacts. get_oor returns a logical vector corresponding to elements of the time series that are out of range, as defined by amount of relative change from baseline using the lim and baseline arguments. fix_artifacts returns a copy of the time series with artifacts and missing data replaced by interpolated values, or a copy of the time series with all values changed to NA in the event that the artifacts are too numerous (exceed max_loss) or too continuous (exceed max_gap).

Advice on setting `lim` argument

fix_artifacts uses the lim argument to impose a floor and ceiling to the values that the time series is allowed to take. Values that exceed this range are then treated like any other artifact. The purpose of the lim argument is to catch time series that contain "slow drift", meaning the signal gradually drifts into implausible values. In setting the lim argument, think about what sort of relative change is plausible for your measurement. For example, if you are measuring pupils, the normal pupil size in adults varies from a minimum diameter of about 2 mm (3 square mm area) to a maximum diameter of about 8 mm (50 square mm area) in bright light vs. darkness. This means if the pupil starts off at maximum dilation, it can experience at most a 75% decrease in diameter (95% reduction in area). If the pupil starts off at minimum dilation, it can experience at most a 300% increase in diameter (1500% increase in area). This would correspond to lim = c(-0.75, 3) for diameter and lim = c(-0.95, 1500) for area, but you should set your limits to be more conservative if you do not expect your measurements to span this range.

For example, most users will measure pupils under moderate lighting conditions, so baseline pupil readings will start off closer to the center of their physiological range. Even assuming maximum decreases and increases from this point, the range could be narrowed to lim = c(-0.6, 0.6) for diameter and lim = c(-0.85, 1.5) for area. We have found that with psychological stimuli using our eye-tracking setup, lim = c(-0.5, 0.5) and lim = c(-.75, 1.25) appear to provide liberal coverage for plausible changes in pupil diameter and area, respectively, and you want your lim setting to err on the side of being too wide. Note that fix_artifacts calls the get_oor ("oor" for "out of range") function to identify which periods of the time series violate the lim argument. If you wish to know which periods are out of range, you can also call this function directly.

If the time series contains periods of artifacts that are too long for interpolation defined by the max_gap argument, or if the total number of artifacts exceeds the proportion specified by the max_loss argument, fix_artifacts will silently return a time series consisting of all missing values so that the user can easily identify which trials need to be dropped.

This algorithm is intended for the detection of artifacts in relatively short time series corresponding to pupil-dilation measurements. It has only been tested and validated on 6-8 second trials sampled at 60 Hz or 500 Hz. Good artifact detection may or may not generalize to other sampling rates, trial lengths, and types of data. Please use plot_artifacts to inspect the performance of get_artifacts before continuing with fix_artifacts and any subsequent data cleaning and analysis.

plot_artifacts, plot_comparison, normalize, low_pass_filter