In bnicenboim/eeguana: Flexible Manipulation of EEG Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(width = 80)

The package eeguana provides a framework for doing simple pre-processing with specialized functions and manipulating EEG data with dplyr verbs (e.g., eeg_mutate, eeg_filter, eeg_summarize) extended to a new class eeg_lst, and ggplot wrapper functions. The new class is inspired by tidyverse principles but it's not really "tidy" (due to space considerations), it's a list of (i) a wide table that contains the signal amplitudes at every sample point of the EEG, (ii) an events table with information about markers (or triggers), blinks and other exported information, and (iii) a long table with experimental information, such as participant (recording), conditions, etc.

While it's possible to transform the eeg_lst to a data.frame, data.table or a tibble (with as.data.frame(), as.data.table() and as_tibble()), the motivation for manipulating the data in the eeg_lst format has to do with size considerations. In this case, the original file was 113 MB, converting it to a long format entails a lot of repetition and generates an object of 556 MB. While this will still work here, a long format quickly becomes prohibitive in real settings with longer recordings.

Working with eeguana

Dplyr-like verbs always return an eeg_lst object, which allows us to use magrittr's pipe, %>% (see ?`dplyr-eeguana`) or the relatively new R pipe |>. In addition, eeg_/ch_ functions will also return an eeg_lst unless they have a suffix _tbl that indicates they return a data frame. In general, we will work with the eeg_lst, unless we want to modify the channels information (with channels_tbl()) or the events table containing markers, artifacts ans custom annotations (with events_tbl).

A practical example: the N170 effect

Here, I exemplify the use of eeguana with raw EEG data exported in the format of BrainVision 1.0. The data belong to a simple experiment where a participant was presented 100 faces and 100 assorted images in random order. The task of the experiment was to mentally count the number of faces.

First we download the data:

# Run the following or just download the files from raw_faces folder in https://osf.io/tbwvz/
library(httr)
GET("https://osf.io/wv5hp//?action=download",
  write_disk("./s1_faces.vhdr", overwrite = TRUE), 
  progress()
)
GET("https://osf.io/c8wea//?action=download",
  write_disk("./s1_faces.vmrk", overwrite = TRUE), 
  progress()
)
GET("https://osf.io/uhsde//?action=download",
  write_disk("./s1_faces.eeg", overwrite = TRUE), 
  progress()
)

BrainVision 1.0 exports three files: s1_faces.vhdr, s1_faces.vmrk, and s1_faces.eeg. The file s1_faces.vhdr contains the metadata and links to the other two files, s1_faces.vmrk contains the triggers and other events in the samples, and s1_faces.eeg contains the signals at every sample for every channel recorded.

We load the relevant packages first. Here, I use tidytable which is a faster alternative to dplyr with a very similar interface.

library(tidytable) #or library(dplyr)
library(ggplot2)
library(stringr)
library(eeguana)
set.seed(123) # ICA will always find the same components

We first need to read the data:

faces <- read_vhdr("s1_faces.vhdr")

The function read_vhdr() creates a list with data frames for the signal, events, segments information, and incorporates in its attributes generic EEG information.

faces
summary(faces)

We see that there is no electrode positions in the object, but since we know that the layout was a standard 10/20, we'll add this layout to the object, using the dataset layout_32_1020. This is required to be able to create topographic plots.

channels_tbl(faces)
## In case the order of the electrodes is different, we do a left_join instead of replacing the table:
channels_tbl(faces) <- select(channels_tbl(faces), .channel) |>
  left_join(layout_32_1020)

The plots that eeguana produce are ggplot objects that can be modified like regular ggplot.

plot(faces) +
  ggtitle("All the experiment")

Pre-processing

Here the preprocessing is done with only one EEG dataset. If the EEG data of different participants is available, one can repeat a similar preprocessing pipeline for each dataset and bind the EEG datasets with bind().

Rereferencing

In this dataset, the signal from all electrodes is monopolar and referenced to the left mastoid (M1). We want the signal (excluding the EOG channels) to be referenced to linked (left and right) mastoids (M1 and M2).

faces <- eeg_rereference(faces, -VEOG, -HEOG, .ref = c("M1", "M2"))

Filtering

We apply a band pass filter of 0.1 to 30 Hz to all the channels except the EOG channels. We don't segment yet, because discontinuities in the signal create artifacts on the edges.

faces_filt <- eeg_filt_band_pass(faces, -HEOG, -VEOG, .freq = c(.1, 30))

ICA

We want to apply ICA to as much data as possible, but to "representative" data: to data of the experiment and not when the participants were moving or reading the instructions. For the same reason, we want to the ignore artifacts that we are sure are not representing brain activity, because of the extreme amplitudes in the signal.

We first cut a large segment that excludes data before and after the experiment was ran using eeg_segment() (that is, only data from the marker "s111" to "s121"), and then we mark differences of 200 microvolts between peaks with eeg_artif_minmax().

faces_ls <- eeg_segment(faces_filt, 
                        .description == "s111", 
                        .end = .description == "s121") |>
  eeg_artif_minmax(-HEOG, -VEOG, 
                   .threshold = 200, 
                   .window = 200, 
                   .unit = "ms")

eeg_artif_minimax() only adds the artifacts in the events table and doesn't modify the signal. We can have a look at the type of artifacts that were detected, by plotting the signal and the events with annotate_events(). It would be a good idea to just check a couple of seconds of the signal, for that we use the dplyr-like function eeg_filter().

faces_ls |>
  eeg_select(-HEOG, -VEOG)|>
  eeg_filter(as_time(.sample, .unit = "s") |> between(0, 90)) |>
  plot() +
  annotate_events() +
  theme(legend.position = "bottom")

Now we can run ICA, removing the EOG and reference electrodes, and ignoring the artifacts.

## By default, it will ignore artifacts
faces_ica <- faces_ls |>
  eeg_ica(-HEOG, -VEOG, -M1, -M2, .method = adapt_fast_ICA, .ignore = .type == "artifact")

We can now check the different topographic plots of the ICAs with plot_components(), and their correlation with the EOG channels looking a the summary of the object.

faces_ica |> plot_components()
eeg_ica_summary_tbl(faces_ica)

We'll look closer at the ICA that are more likely to be related to eye movements, and we'll compare with signals that look like blinks and saccades:

faces_ica <- faces_ica |>
  eeg_artif_step(HEOG, VEOG, 
                 .threshold = 30, 
                 .window = 200, 
                 .unit = "ms", 
                 .freq = c(1, 10)) |>
  eeg_artif_peak(VEOG, .threshold = 100, .freq = c(1, 10))

In order to investigate the ICA, we are using an experimental function not yet exported (that's why we use the :::). This function will probably converted into a shiny app. It's possible to get the same plots by using a combination of plot_topo, eeg_ica_show, eeg_ica_summary_tbl, and some ggplot and cowplot functions.

We'll see how the components explaining the most variance and most correlated to the EOG channels behave:

#samples with blinks
s_peaks <- filter(events_tbl(faces_ica),  str_starts(.description, "peak")) |>
  pull(.initial)
eeguana:::plot_ica.eeg_ica_lst(faces_ica, samples = seq(s_peaks[1]-2000,s_peaks[1]+2000))

The clearest ones is ICA5, and so we'll just remove that one.

faces_icaed <- faces_ica |>
  eeg_ica_keep(-ICA5) |>
  as_eeg_lst()

Now we'll segment the data to appropriate check if there are still artifacts.

events_tbl(faces_icaed) <- events_tbl(faces_icaed) |>
  filter(!.type %in% "artifact")

faces_seg <- faces_icaed |>
  eeg_select(-description, -type) |>
  eeg_segment(.description %in% c("s70", "s71"), .lim = c(-.1, .5))

faces_seg_artif <- faces_seg |>
  eeg_artif_minmax(-HEOG, -VEOG, .threshold = 100, .window = 150, .unit = "ms") |>
  eeg_artif_step(-HEOG, -VEOG, .threshold = 50, .window = 200, .unit = "ms")

## extracts the ids of the segments with artifacts
bad <- filter(events_tbl(faces_seg_artif), .type == "artifact") |> 
  pull(.id) |> 
  unique()
## Show the segment with artifact and one before and after:
faces_seg_artif |>
  eeg_filter(.id %in% c(bad-1, bad, bad+1)) |>
  eeg_select(-VEOG, -HEOG) |>
  plot() +
  annotate_events() +
  theme(legend.position = "bottom")

faces_seg <- faces_seg_artif |>
  eeg_events_to_NA(.type == "artifact", .entire_seg = TRUE, .drop_events = TRUE)

summary(faces_seg)

Finally, we can baseline the segments:

faces_seg <- faces_seg |> 
  eeg_baseline()

Visualization

We edit the segmentation information and add more descriptive labels.

faces_seg <- faces_seg |>
  eeg_mutate(
    condition =
      ifelse(description == "s70", "faces", "non-faces")
  ) |>
  eeg_select(-type)

faces_seg

With some ggplot skills, we can create customized plots. ggplot is overloaded to work on an eeg_lst by first downsampling the signal when necessary, and converting it to a long-format data frame that is feed into ggplot. This object can then be customized. (Notice that the channels, or component names are in a .key column and their amplitude in .value column, and instead of samples there are now .time in seconds).

## ggplot uses internally a table that looks like this:
faces_seg |> eeg_filter(.id %in% 1:3) |>
  as.data.frame()

faces_seg |>
  eeg_select(O1, O2, P7, P8) |>
  ggplot(aes(x = .time, y = .value)) +
  geom_line(alpha = .1, aes(group = .id, color = condition)) +
  stat_summary(
    fun = "mean", geom = "line", alpha = 1, size = 1.5,
    aes(color = condition)
  ) +
  facet_wrap(~.key) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = .17, linetype = "dotted") +
  theme(legend.position = "bottom")

We can see here the N170 component in the faces condition for the average trial on a single participant. We can investigate the signal by averaging the channels of the occipital and parietal lobes using transmute(), and the special function, chs_mean(), a wrapper for rowMeans(), which takes as arguments the relevant channels and whether missing values should be omitted from the calculations.

faces_seg |>
  eeg_transmute(
    Occipital = chs_mean(across(starts_with("O")), na.rm = TRUE), #O1, O2, Oz
    Parietal = chs_mean(across(starts_with("O")), na.rm = TRUE) # P3, P4, P7, P8, Pz
  ) |>
  ggplot(aes(x = .time, y = .value)) +
  geom_line(alpha = .1, aes(group = .id, color = condition)) +
  stat_summary(
    fun = "mean", geom = "line", alpha = 1, size = 1,
    aes(color = condition)
  ) +
  facet_wrap(~.key) +
  theme(legend.position = "bottom")

We can also calculate the ERPs and then plot them in their layout (and we'll add the same theme that plot uses for eeg_lsts):

ERP_faces <- faces_seg |>
  eeg_group_by(.sample, condition) |>
  eeg_summarize(across_ch(mean, na.rm = TRUE))

ERP_plot <- ERP_faces |>
  ggplot(aes(x = .time, y = .value)) +
  geom_line(aes(color = condition)) +
  facet_wrap(~.key) +
  theme(legend.position = "bottom") +
  ggtitle("ERPs for faces vs non-faces") +
  theme_eeguana()

ERP_plot |> plot_in_layout()

Another possibility is to create a topographic plot of the two conditions, by first making segments that include only the interval .1-.2 s after the onset of the stimuli, creating a table with interpolated amplitudes and using the ggplot wrapper plot_topo().

faces_seg |>
  eeg_filter(between(as_time(.sample, .unit = "s"), .1, .2)) |>
  eeg_group_by(condition) |>
  eeg_summarize(across_ch(mean, na.rm = TRUE)) |>
  plot_topo() +
  annotate_head() +
  geom_contour() +
  annotate_electrodes(colour = "black") +
  facet_grid(~condition)

For more specialized plots or analyses, it might be necessary to extract the data in long data frame format first. Even though, we can do this without transforming the object, here, we transform the data first and then we visualize independent t-tests at every electrode and time point.

df <- faces_seg |>
  eeg_select(O1, O2, P7, P8) |>
  as_tidytable() |> #or alternatively as_tibble() if one uses dplyr
  # We can use regular dplyr/tidytable functions now
  group_by(.key, .time) |>
  summarize(
    `t-value` = t.test(
      .value[condition == "faces"],
      .value[condition == "non-faces"]
    )$statistic
  )

df

Then we just load the data frame into ggplot.

ggplot(df, aes(x = .time, y = `t-value`)) + geom_line() +
  facet_wrap(~.key)

However, this can be also done in the without transforming the eeg_lst object:

faces_seg_t <-
  faces_seg |>
  eeg_select(O1, O2, P7, P8) |>
  eeg_group_by(.sample) |>
  eeg_summarize(across_ch(list(t =  ~t.test(
    .[condition == "faces"],
    .[condition == "non-faces"]
  )$statistic)))

faces_seg_t |>
  ggplot(aes(x = .time, y = .value)) +
  geom_line(alpha = .1, aes(group = .id)) +
  stat_summary(fun = "mean", geom = "line", alpha = 1, size = 1) +
  facet_wrap(~.key) +
  theme(legend.position = "bottom")