Analyzing time series data using `permu.test`
In permutes: Permutation Tests for Time Series Data

This vignette models ERP data, an example of time series data, using permutation testing. The permutes package implements two approaches, one based on package lmPerm, which is shown here, and the other based on package buildmer, which is shown in the other vignette. The lmPerm approach can do multivariate responses, but cannot run generalized linear models and only has very limited support for random effects (i.e.:\ the Error() functionality works to run ANOVA with random effects, but only using Type II SS). The buildmer approach can fit any model that function buildmer from package buildmer can (i.e.:\ lm, glm, lmer, and glmer models), but cannot fit multiple responses. Also, the buildmer approach can use the cluster-mass test by @marisoostenveld, using the relevant machinery in package permuco.

Sample data

The dataset we will be working with is called MMN, which is an excerpt from a passive-oddball task conducted by @jager A proper method section with more details should eventually be found in that paper, but a brief and incomplete summary follows here in order to make it easier to work with the data. @jager presented Dutch learners of English with a stream of isolated vowels from English or from Dutch created by means of formant synthesis. Each of her six conditions (summarized in the below table; the package supplies data from two of these conditions) each having used four vowels: three of the same phonetic category (e.g.\ three realizations of the DRESS vowel with slightly different formant frequencies) and one of a different phonetic category (e.g.\ the TRAP vowel). The first set of vowels, termed 'standards', were presented frequently; the fourth vowel, the 'deviant', infrequently interrupted the stream of standards. Participants were instructed not to pay attention to this stream of sounds, and watched a silent movie to keep them occupied. EEG recordings of 30 electrodes (T7 and T8 were excluded) were recorded while the participants were exposed to the vowel stimuli. At the presentation of one of the deviant vowels, we expect to observe a negative deflection in the EEG signal about 200 milliseconds after stimulus onset, originating from frontal and central electrode sites. This effect is called the 'mismatch negativity'. A second effect called the 'P300' may also occur, but is ignored here. The data supplied with permutes are a subset of the vowel pairs tested by @jager consisting of the English vowel in DRESS presented as a standard vs.\ as a deviant, in both cases against the vowel from the Dutch word ZET.

The first few rows of the data look as follows:

library(permutes)
head(MMN)
nrow(MMN) #how many observations?
length(unique(MMN$Time)) #how many timepoints?

The first 30 columns are the 30 sampled EEG electrodes. The Time column denotes the moment from stimulus onset, in milliseconds, of each measurement type. Subject is the participant, and Session is the session of the experiment minus one, to make it a proper dummy indicating whether the datapoint is from the first (0) or the second session (1). Finally, Deviant is a dummy indicating whether the stimulus was a standard (0) or a deviant (1), explained in the next paragraph. Note that, contrary to the results and recommendations in @brysbaert, @jager averaged over all her items belonging to the same condition in this experiment.

Time series data such as these are special because the data consist of multiple, strongly correlated, measurements of the same signal. This generates two problems. The first is a research problem: which portion of this time series do we want to analyze? The permutes package was designed to help researchers answer precisely this question. The second problem is of a statistical nature: how do we handle the strong autocorrelation present throughout this sampled window? Note that in the case of ERP data, these same two problems are additionally encountered in the spatial domain: what combination of electrodes ('region of interest') do we want to analyze, and how do we deal with the spatial correlation present in data measured from neighboring sites? This issue is dealt with in the permutes package by running separate analyses for each site, as is shown below.

Determining the window and ROI

The first problem equates to what is known in the ERP literature as determining the window. The normal way to do this is to run a MANOVA (with the 30 electrodes as the dependent variables) on every individual timepoint, and take as the window the point where a large sequence of significant $p$-values occurs. If we plot time horizontally and the electrode site vertically, we can use this approach to select both a time window and an ROI at the same time.

It should be noted that this approach suffers from an extreme multiple-comparisons problem: we will be running $30\times231$ ANOVAs! When compared to an asymptotic null distribution, the $p$-values will be spuriously significant in, expectedly, 347 random combinations. The permutation testing approach to time series data helps with this problem in two ways. Firstly, and most importantly, the results from a permutation analysis are never interpreted as actual findings; rather, they only serve as a guideline to empirically determine the analysis window and ROI. Secondly, the null distribution is not taken as the asymptotic $F$ distribution, but is rather inferred from the data itself by permutation testing: the null distribution is obtained by randomly permuting the entries of $Y$ and assessing how badly this perturbs the model fit; if a permutation incurs a large deterioration of the fit, then the portions of the design matrix associated with that permutation apparently contributed rather significantly to obtaining the proper fit.

The below code runs a permutation test series on the MMN data and plots the results as a heatmap. Note that permutation testing is not deterministic (the permutations are chosen randomly), so your results may be slightly different from mine.

perms <- permu.test(cbind(Fp1,AF3,F7,F3,FC1,FC5,C3,CP1,CP5,P7,P3,Pz,PO3,O1,Oz,O2,PO4,
    P4,P8,CP6,CP2,C4,FC6,FC2,F4,F8,AF4,Fp2,Fz,Cz) ~ Deviant * Session | Time,
    data=MMN)
## (output not shown)

This takes a few seconds to run. We can speed it up by parallelizing the testing of the individual timepoints:

library(doParallel)
cl <- makeCluster(2) #or more
registerDoParallel(cl)
perms <- permu.test(cbind(Fp1,AF3,F7,F3,FC1,FC5,C3,CP1,CP5,P7,P3,Pz,PO3,O1,Oz,O2,PO4,
    P4,P8,CP6,CP2,C4,FC6,FC2,F4,F8,AF4,Fp2,Fz,Cz) ~ Deviant * Session | Time,data=MMN,
    parallel=TRUE)
## (output not shown)

set.seed(10)
perms <- permu.test(cbind(Fp1,AF3,F7,F3,FC1,FC5,C3,CP1,CP5,P7,P3,Pz,PO3,O1,Oz,O2,PO4,
    P4,P8,CP6,CP2,C4,FC6,FC2,F4,F8,AF4,Fp2,Fz,Cz) ~ Deviant * Session | Time,
    data=MMN)

We can then plot the results:

plot(perms)

Following @marisoostenveld, the plot method from permutes by default plots $F$-values. @marisoostenveld then compute a cluster statistic over a range of temporally and/or spatially adjacent $F$-values that all match an inclusion criterion (e.g.\ $p<.05$). This cluster statistic is not explicitly calculated by permu.test (in the general case, permu.test does not know which responses in any multivariate design should be considered adjacent), but the clusters can be identified by the researcher based on their own cut-off criteria, and the researcher can then run any test they want on the subset of data falling within the cluster. If we want to stick to @marisoostenveld, this test should be another permutation ANOVA over the whole window.

Based on visual inspection, the plot suggests one windows for the factor 'Deviant', located around 200 ms post-stimulus-onset. This window corresponds nicely to the window in which one usually finds MMN effects. The region of interest can be tentatively estimated as frontal and central (which is good news for @oddball, as this is in line with prior literature on the MMN component), but we would like some verification. One option is to follow @marisoostenveld in discarding those results that have failed to achieve significance; we can then check again for contiguous bands that achieve large effect sizes, and base our ROI on their positions. We can do this using the sig option to the plot method.

plot(perms,sig='p')

From this plot, the effect looks to be relatively robust at, indeed, the frontal and central regions. As a next step, we should determine the temporal window with more precision. In absence of a formal criterion, we simply take the narrowest window in which at least ten of the frontal and central electrodes achieve significance. Because permutes objects are also normal data frames, this is easily achieved.

ROI <- c('Fp1','AF3','F7','F3','FC1','FC5','C3','CP1', 'CP5','CP6','CP2','C4',
     'FC6','FC2','F4','F8','AF4','Fp2','Fz')
head(perms) #what do we have?
perms2 <- perms[perms$Factor == 'Deviant' & perms$Measure %in% ROI,]
perms2$sig <- perms2$p < .05
perms2 <- aggregate(sig ~ Time,perms2,sum)
plot(perms2$Time,perms2$sig) #look at all windows
print(perms2[perms2$sig > 10,'Time']) #our arbitrary criterion

For technical reasons, the Time variable was coerced to a factor. We see that contiguous permutation significance was achieved from the 88th to the 136th level of this factor. The below code shows that this corresponds to a window from 171 ms to 265 ms post-stimulus-onset.

print(unique(MMN$Time)[c(88,136)])

Having determined a window and an ROI, we can now proceed to the actual modeling step.

Running the actual analysis

We can now run the actual analysis that we want. We will average our data along the aforementioned window and ROI, and then run a permutation linear mixed-effects model to analyze the data, so that we can easily obtain robust $p$-values. The advantage of these permutation $p$-values is, again, that we could run multiple such models without requiring any Bonferroni correction. For the present single model, however, an ordinary linear mixed-effects model would also suffice.

data <- MMN[MMN$Time > 171 & MMN$Time < 265,]
data$amplitude <- rowMeans(data[,ROI])
data <- aggregate(amplitude ~ Deviant + Session + Subject,data,mean)
model <- perm.lmer(amplitude ~ Deviant * Session + (Deviant + Session | Subject),data)

Finally, we look at the resulting summary:

print(model)

We first observe that the highest random-effects structure that this model could support was (1 + Session | Subject). Given that the data only contain 81 observations---due to Jager's averaging over items---this is not surprising. The main conclusions to be drawn from these results is that there is a significant mismatch negativity of $-1.65$ microvolts. No evidence is found for a change in this MMN over the two sessions of the experiment.

This concludes our expository foray into these data.