In aineito/stats.VWP: Analysing visual world eye-tracking data in R

knitr::opts_chunk$set(echo = TRUE)

Packages

Load the following packages. If you haven't installed them yet, do so first (e.g., install.packages("learnr")). If you haven't installed the stats.VWP package (course materials) yet, install it using this command remotes::install_github("aineito/stats.VWP").

require(learnr)
require(Rmisc)
require(tidyverse)
require(ggplot2)
require(lme4)
require(stats.VWP)

We will also use gazer package for this tutorial. Install and load it using the commands below.

remotes::install_github("dmirman/gazer")

require(gazer)

This is optional: If you run the command below, R will display very large or very small numbers in a plain format (e.g., 100000 instead of 1e+05, 0.000001 instead of 1e-06).
If you like the latter format, skip the command below.

options(scipen=999)

Look at the data

We will use fix.binom.50bin data in the stats.VWP package for this tutorial.
Let's look at the summary.

summary(fix.binom.50bin)

To look at the details of the variables, see the help page.

?fix.binom.50bin

We will use the data from -800 ms to 0 ms relative to the target word onset, and drop the 'Jap' condition from Condition.

Q1:

Subset the data that only contains Time between -800 ms and -1 ms (do not include Time 0) and Condition 'Targ', 'Eng' and 'Unr'.
Then, drop the levels you don't need for the analysis (i.e., drop the 'Jap' level).
Save the new data set as 'Q1' and print the summary of the new data set.

Click Run Code to test your code. Click Solution to see the solution.
Hint: Use droplevels() to drop unused levels from factors

Q1 = fix.binom.50bin %>% filter(Time >= -800, Time < 0, Condition %in% c('Targ','Eng','Unr')) %>% droplevels()
summary(Q1)

fix.binom.pred = fix.binom.50bin %>% filter(Time >= -800, Time < 0, Condition %in% c('Targ','Eng','Unr')) %>% droplevels()

In R, you want to use an informative name for the data set. Let's use fix.binom.pred for this tutorial.

We need to create a second-order polynomial based on the Time values in the range of the selected time.
We will use the code_poly command from the gazer package.

poly.order=2 indicates that we need a second-order polynomial.
orthogonal=T indicates that we want the polynomial to be orthogonal (if you set it to FALSE, you will get natural polynomial).
draw.poly=T indicates that we want to create a graph showing transformed polynomial predictor values.

fix.binom.pred = code_poly(df=fix.binom.pred, predictor="Time", poly.order=2, orthogonal=T, draw.poly=T)

If we check the data set again, we can see that we now have poly1 and poly2 columns added to the original data set.

summary(fix.binom.pred)

poly1 corresponds to the linear term, and poly2 corresponds to the quadratic term.

Now, let's code our categorical variables.

We want to use dummy-coding (or treatment-coding) for Condition because we want to treat the 'Unr' condition as the baseline condition (reference level).
We will use sum-coding for Lang.

First, let's check the default coding for Condition. The default coding scheme for R is dummy-coding. We can check the contrasts by running the command below:

contrasts(fix.binom.pred$Condition)
contrasts(fix.binom.pred$Lang)

R treats the first level ('Targ' and 'L1' in our case) as the reference level.
We want to change the matrix so that the 'Unr' condition becomes the reference level.
For sum-coding, we can use contr.sum function. We put the number 2 because we have 2 levels for Lang.

contrasts(fix.binom.pred$Condition) = matrix(c(1,0,0,0,1,0),ncol=2)
contrasts(fix.binom.pred$Lang) = contr.sum(2)

Now, the contrasts look good.

contrasts(fix.binom.pred$Condition)
contrasts(fix.binom.pred$Lang)

Let's doublecheck the data again.

summary(fix.binom.pred)

Fit the model

Now, we're ready to fit the model.

Let's construct a simple random-effects structure model just with the by-subject and by-item intercepts (no random slopes).

This is a GLMM testing an interaction of Condition by Lang and effects of Condition and Lang. It also includes 2-way interactions of Condition by poly1, Condition by poly2, Lang by poly1 and Lang by poly2, and 3-way interactions of Condition by Lang by poly1 and Condition by Lang by poly2.

Note: Here we are excluding random slopes just to save time. So the results here are partly inconsistent with what is reported in the paper. When you create a model, you would need to choose the best structure for your data.

fix.binom.pred.m.cond.lang = glmer(Count ~ (poly1+poly2) * Condition * Lang + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred)

Plot the model fit

Before looking at the summary of the model, let's plot the model fit so that we can compare the model output with the graph.

We will add the model fit to the data set we used for the analysis.
The command below will create a new column mfit.

fix.binom.pred$mfit = fitted(fix.binom.pred.m.cond.lang)

ggplot(fix.binom.pred, aes(Time, Count, color=Condition, lty=Condition)) + 
  facet_wrap(~Lang) + theme_bw() + 
  stat_summary(fun=mean,geom="point") +
  stat_summary(aes(y=mfit),fun=mean,geom="line",size=1) +
  stat_summary(fun.data=mean_se, geom="pointrange") + 
  labs(y="Fixation Proportion", x="Time relative to target word onset (ms)") + 
  scale_color_manual('Condition',breaks=c('Targ','Eng','Unr'),labels=c("Target","English competitor","Unrelated"),values=c("red","blue","darkgrey")) +
  scale_linetype_manual('Condition',breaks=c('Targ','Eng','Unr'),labels=c("Target","English competitor","Unrelated"),values=c("solid","longdash","twodash")) +
  theme(text=element_text(size=12),legend.key.height=unit(.3,"in"),legend.key.width=unit(.6,"in"))

Good. Now let's check the summary of the model.

summary(fix.binom.pred.m.cond.lang)

At the top, you can find the model's formula and the name of the data set you used.
You can find the effects and interactions of the predictors in the fixed effects section.

Let's look at the effect of Condition first. In the most left column, you can see Condition1 and Condition2. The p-values for both (in the Pr(>|z|) column) are smaller than .05.

Q2:

question_checkbox("What do they suggest? Hint: Do you remember the contrasts for 'Condition'? (see below)",
  answer("The target object was significantly more likely to be fixated than the English competitor object, and the English competitor object was significantly more likely to be fixated than the unrelated object"),
  answer("The target object and the English competitor object were both significantly more likely to be fixated than the unrelated object", correct=T),
  answer("The fixation proportion difference between the target condition and the unrelated condition was significantly larger than the fixation proportion difference between the English competitor condition and the unrelated condition"),
  allow_retry = T
)

contrasts(fix.binom.pred$Condition)

Now, if we look at the 2-way interactions of Condition by poly1 and Condition by poly2, the p-value for poly1:Condition1 is significant.
This suggests that the effect of condition (Target vs. Unrelated) interacted with the linear term.
If you look at the graph, you can see that the fixation proportion for the Target condition increased over time, whereas the fixation proportion for the Unrelated condition stayed around 0.2 (imagine a linear line on these plots).
This time-course difference was similar in L1 and L2 groups (no 3-way interaction, see poly1:Condition1:Lang1).

Additionally, there were significant 2-way interactions of Condition1:Lang1 and Condition2:Lang1 on the intercept term and significant 3-way interactions of poly2:Condition1:Lang1 and poly2:Condition2:Lang1.

We can resolve these interactions by running the model separately for each group.

fix.binom.pred.m.cond.L1 = glmer(Count ~ (poly1+poly2) * Condition + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred[fix.binom.pred$Lang=='L1',]) # model for L1 group

summary(fix.binom.pred.m.cond.L1)

fix.binom.pred.m.cond.L2 = glmer(Count ~ (poly1+poly2) * Condition + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred[fix.binom.pred$Lang=='L2',]) # model for L2 group

summary(fix.binom.pred.m.cond.L2)

We can see that there is a significant effect of Condition1 in both groups, but the effect is larger in the L1 group than in the L2 group. Thus, both L1 and L2 groups were more likely to look at the target object over the unrelated object, but this effect was larger in the L1 group than in the L2 group

The effect of Condition2 is significant in the L1 group but not in the L2 group. Thus, the L1 group was more likely to look at the English competitor object over the unrelated object, but the L2 group was not.

Now, to the 3-way interactions of poly2:Condition1:Lang1 and poly2:Condition2:Lang1.
The poly2:Condition1 interaction is significant in both groups. That is, the curvature for the Target condition is different from that for the Unrelated condition in both groups. The curvature is shallower for the Unrelated condition in the L1 group, whereas it is shallower for the Target condition in the L2 group.

The poly2:Condition2 interaction is significant in the L1 group. The curvature is shallower for the Unrelated condition compared to the English competitor condition. There was no such difference in the L2 group.

Evaluate the need for higher-order polynomial(s)

In the previous section, we fitted the model including linear and quadratic terms.
Let's check if the model with linear and quadratic terms is better than the model without a quadratic term.

To do so, we need to construct another model without a quadratic term.

For this illustration, we will use the L1 group's data. As we can see, we just dropped poly2 from the original model.

fix.binom.pred.m.cond.lang.linear = glmer(Count ~ poly1 * Condition * Lang + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred)

summary(fix.binom.pred.m.cond.lang.linear)

Let's plot the model fit again.
We will add the model fit with a new name mfit_linear.

fix.binom.pred$mfit_linear = fitted(fix.binom.pred.m.cond.lang.linear)

ggplot(fix.binom.pred, aes(Time, Count, color=Condition, lty=Condition)) + 
  facet_wrap(~Lang) + theme_bw() + 
  stat_summary(fun=mean,geom="point") +
  stat_summary(aes(y=mfit_linear),fun=mean,geom="line",size=1) +
  stat_summary(fun.data=mean_se, geom="pointrange") + 
  labs(y="Fixation Proportion", x="Time relative to target word onset (ms)") + 
  scale_color_manual('Condition',breaks=c('Targ','Eng','Unr'),labels=c("Target","English competitor","Unrelated"),values=c("red","blue","darkgrey")) +
  scale_linetype_manual('Condition',breaks=c('Targ','Eng','Unr'),labels=c("Target","English competitor","Unrelated"),values=c("solid","longdash","twodash")) +
  theme(text=element_text(size=12),legend.key.height=unit(.3,"in"),legend.key.width=unit(.6,"in"))

If you compare this graph with the graph plotting a model fit for the model with linear and quadratic terms, you can see that the model fit for the model without a quadratic term looks worse.

Quick Q: How can you tell?

Let's do a model comparison to doublecheck.
We can use the anova() function to compare two (or more) models.

anova(fix.binom.pred.m.cond.lang, fix.binom.pred.m.cond.lang.linear)

The model comparison result suggests that the model with linear and quadratic terms is significantly better than the model without a quadratic term.

Report the results

Once you get the results from the best model, you can write a summary of the findings.

There are two rules of thumb for reporting growth curve analysis results (from Dan Mirman's website):

Clearly describe each of the three key components of the model.

The functional form (third-order orthogonal polynomial), the fixed effects (effect of Condition on all time terms), and the random effects (effect of Subject on each of the time terms and nested effects of Subject-by-Condition on each of the time terms except the cubic). Depending on the circumstances and complexity of the model, you may want to include additional information about the factors and why they were included or not. It's also a good idea to report which method was used for computing p-values.

For key findings, report parameter estimates and standard errors along with significance tests.

In some cases the model comparison is going to be enough, but for key findings, the readers should want to see the parameter estimates. The parameter estimate standard errors are critical for interpreting the estimates, so those should be reported as well. The t-values are not critical to report (they are just Estimate divided by the Std Error, so they can always be computed from the reported estimates and standard errors). If there are many estimated parameters, it may be a good idea to focus the main text discussion on the most important ones and report the full set in a table or appendix.

See here for an example.

Homework

Identify the best model structure and justify why it is the best model. (Note: In this tutorial, we used the simple model without random slopes, which is probably not the best model.)
Copy and paste the model syntax you used and the fixed effects from the model summary.
Write a one-paragraph report summarising the findings. Report the results from the best model.

aineito/stats.VWP documentation built on March 10, 2023, 4:44 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com