In aineito/stats.VWP: Analysing visual world eye-tracking data in R

knitr::opts_chunk$set(echo = FALSE)

Packages

Load the following packages. If you haven't installed them yet, do so first (e.g., install.packages("learnr")). If you haven't installed the stats.VWP package (course materials) yet, install it using this command remotes::install_github("aineito/stats.VWP").

require(learnr)
require(Rmisc)
require(tidyverse)
require(ggplot2)
require(lme4)
require(stats.VWP)

This is optional: If you run the command below, R will display very large or very small numbers in a plain format (e.g., 100000 instead of 1e+05, 0.000001 instead of 1e-06).
If you like the latter format, skip the command below.

options(scipen=999)

Look at the data

We will use fix.window data in the stats.VWP package for this tutorial.
Let's look at the summary.

summary(fix.window)

This data set contains the data from -800 ms to 0 ms relative to the target word onset.
Let's compute the mean, SD, SE and CI for FixP for each condition for each group.

(fix.window.summary = Rmisc::summarySE(fix.window, measurevar='FixP', groupvars=c('Lang','Condition'), na.rm=T) )

Now, let's plot the data.

ggplot(fix.window.summary, aes(x = Condition, y = FixP, fill = Condition)) +
  ggtitle("Fixation proportion with 95% CI") +
  geom_bar(stat = "identity") +
  geom_errorbar(aes(ymin=FixP-ci, ymax=FixP+ci), width=.3) +
  scale_fill_manual('Condition',values=c('red','royalblue','deeppink','darkgrey')) +
  facet_wrap(~Lang)

Data preparation

We will compute the empirical logit elogFix (cf. Barr, 2008, JML) and use it as a dependent variable for the LME analysis.
The formula for computing the empirical logit is: log( (Y+.5) / (N-Y+.5) ) where Y is the total number of samples that fall in the critical interest area, and N is the total number of samples for the current bin.

fix.window = fix.window %>% mutate(elogFix = log((Count+.5)/(allSample-Count+.5)))

Let's check the data again.

head(fix.window)

For the binomial GLMM analysis, we will need a binomial variable as a dependent variable, so we will create a variable indicating whether the critical object was fixated (=1) or not (=0) in this time window.

fix.window = fix.window %>% mutate(IsFixated = if_else(FixP>0, 1, 0))

Let's check the data again.

head(fix.window)

LME

Before running the model, we will code the categorical variables.
For Condition, we will use dummy-coding (or treatment-coding), treating the Unrelated condition as the baseline condition.
For Lang, we will use sum-coding.

contrasts(fix.window$Condition) = matrix(c(1,0,0,0,0,1,0,0,0,0,1,0),ncol=3) 
contrasts(fix.window$Lang) = contr.sum(2)

We can run just the left part to doublecheck.

contrasts(fix.window$Condition)
contrasts(fix.window$Lang)

Ok, we will run the model for the L1 group first.
The model has the maximal random-effects structure (by-subject and by-item random slope and intercept for Condition).
Make sure that both Subject and Item are treated as a factor. We will save the results as lme.L1.

lme.L1 = lmer(elogFix~Condition+(1+Condition|Subject)+(1+Condition|Item), fix.window[fix.window$Lang=='L1',])

Notes about some terms:

Fixed effects: An effect in an experiment is said to be a fixed effect if all possible treatment conditions that a researcher is interested in are present in the experiment.
Random effects: An effect is said to be random if the experiment contains only a random sample of possible treatment conditions.
Random intercepts: The intercepts vary across conditions or groups.
Random slopes: The slopes vary across conditions.

We can use the summary function to look at the model summary.

summary(lme.L1)

When reporting the results, we need to include the estimate, standard error, and the t-value. If the absolute t-value is equal to or larger than 2, the effect is regarded as significant.

In this example analysis, we can report, for example:
"In the time window between -800 ms and 0 ms relative to the target word onset, the target object attracted significantly more fixations than the unrelated object, $\beta$ = 2.6, SE = .55, t = 4.8. The English competitor condition and the Japanese competitor condition did not differ significantly from the unrelated condition, |t|s < 2."

Q1:

Run the model for the L2 group with the maximal random-effects structure.
Save it as lme.L2 and print the model summary.

Click Run Code to test your code. Click Solution to see the solution.

require(tidyverse)
fix.window = fix.window %>% mutate(elogFix = log((Count+.5)/(allSample-Count+.5)))

lme.L2 = lmer(elogFix~Condition+(1+Condition|Subject)+(1+Condition|Item), fix.window[fix.window$Lang=='L2',]) 

summary(lme.L2)

Quick Q: What did you find?

Now, let's see if the effect of Condition (Target vs. Unrelated) interacted with the Lang group.

We will select the two conditions and drop the unused levels.

fix.window.TU = fix.window %>% filter(Condition%in%c('Targ','Unr')) %>% droplevels()

As we subset the data, we will need to define the coding scheme again. Let's use the sum-coding for Condition.

contrasts(fix.window.TU$Condition) = contr.sum(2)

We will save the model output as lme.Cond.Lang.

lme.Cond.Lang = lmer(elogFix~Condition*Lang+(1+Condition|Subject)+(1+Condition+Lang|Item), fix.window.TU)

summary(lme.Cond.Lang)

Ok, the interaction was not significant.

We ran the model with the maximal random-effects structure.

Sometimes, models with maximal random-effects structures do not converge.
If this happens, you will need to simplify the model.
One way of doing this is by dropping the variable(s) that accounts for little variance.
For example, if we look at the random effects section in the model summary of lme.Cond.Lang, the by-subject random slope for Condition does not account for much variance.

Let's try dropping that from the model and look at the model summary.

lme.Cond.Lang.m2 = lmer(elogFix~Condition*Lang+(1|Subject)+(1+Condition+Lang|Item), fix.window.TU) 

summary(lme.Cond.Lang.m2)

Now, let's compare if one of these models is better than the other.
To do this, the models need to be fit with maximum-likelihood estimation.
We can do this by adding REML = F to the model. (The default is TRUE, i.e., it uses the restricted maximum likelihood to estimate the parameters in the analysis.)

Notes: The maximum likelihood tends to produce more accurate estimates of fixed regression parameters, whereas the restricted maximum likelihood tends to produce more accurate estimates of random variables. So there are pros and cons to both methods, but when we compare models, we should use maximum-likelihood estimation.

lme.Cond.Lang.ml = lmer(elogFix~Condition*Lang+(1+Condition|Subject)+(1+Condition+Lang|Item), fix.window.TU, REML = F) 

summary(lme.Cond.Lang.ml)

lme.Cond.Lang.m2.ml = lmer(elogFix~Condition*Lang+(1|Subject)+(1+Condition+Lang|Item), fix.window.TU, REML = F) 

summary(lme.Cond.Lang.m2.ml)

We can compare the two models using the anova function.

anova(lme.Cond.Lang.ml, lme.Cond.Lang.m2.ml)

The model lme.Cond.Lang.ml is better than the model lme.Cond.Lang.m2.ml.

If the model comparison suggests a non-significant difference between the two models, you can drop the variable (because it does not significantly matter for the model fit).

Notes:

AIC (Akaike's information criterion) is basically a goodness-of-fit measure that is corrected for model complexity (i.e., it takes into account how many parameters have been estimated).
BIC (Schwarz's Baysian criterion) is comparable to the AIC. It is slightly more conservative (i.e., it corrests more harshly for the number of parameters being estimated).
Neither the AIC or BIC are intrinsically interpretable (i.e., it's not meaningful to tale about their values being large or small per se), but they are useful as a way of comparing models. Smaller values mean better-fitting models.
Log-likelihood is also used to assess the fit of the model. Again, the smaller the value, the better the model fit.
The deviance is given by: deviance = -2 x log-likelihood. It is sometimes referred to as -2LL because of the way it is calculated.

GLMM

You can build a GLMM in a pretty similar way. We will use the glmer function instead of lmer function. We will run the model for the L1 group first.

The model below has the maximal random-effects structure (by-subject and by-item random slope and intercept for Condition).
Again, make sure that both Subject and Item are treated as factors.

For GLMM, we need to specify family. Here, we have a binomial variable, so we will add family=binomial. We will save the results as glmm.L1.

glmm.L1 = glmer(IsFixated~Condition+(1+Condition|Subject)+(1+Condition|Item), fix.window[fix.window$Lang=='L1',], family=binomial)

Ok, so this model did not converge. Let's look at the summary.

summary(glmm.L1)

Let's simplify the model by dropping the by-item random slope for Condition.

glmm.L1.m2 = glmer(IsFixated~Condition+(1+Condition|Subject)+(1|Item), fix.window[fix.window$Lang=='L1',], family=binomial)

summary(glmm.L1.m2)

Ok, this model converged. Let's compare the two models.
GLMM always uses maximum likelihood estimation, so we can use the models above for the comparison.

anova(glmm.L1, glmm.L1.m2)

The model comparison suggests that the two models do not differ significantly, so we can drop the by-item random slope for Condition.

The results are consistent with the results from the LME analysis.

When reporting the results from the GLMM analysis, we will report the estimate, standard error, z-value and p-value.

For example:
"The target object was more likely to be fixated than the unrelated object, $\beta$ = 1.7, SE = .46, z = 3.7, p <.001."

The z-statistic is analogous to the t-statistic in that it is also used to assess the individual contribution of predictors. Like the t-test in linear regression, the z-statistic tells us whether the $\beta$ coefficient for that predictor is significantly different from zero (i.e., the predictor is making a significant contribution to the prediction of the outcome). It is computed by dividing the regression coefficient by its associated standard error.

Now, let's run the model for the L2 group.

glmm.L2 = glmer(IsFixated~Condition+(1+Condition|Subject)+(1+Condition|Item), fix.window[fix.window$Lang=='L2',], family=binomial)

summary(glmm.L2)

The results are not consistent with the LME analysis. The GLMM results suggest that the target object was not fixated more than the unrelated object. Let's see why.

Q2: Make a graph with Condition on the x-axis and IsFixated on the y-axis for the L2 group. Include 95% CIs.

Click Run Code to test your code. Click Solution to see the solution.

require(tidyverse)
fix.window = fix.window %>% mutate(elogFix = log((Count+.5)/(allSample-Count+.5)), IsFixated = if_else(FixP>0, 1, 0))

fix.window.L2.summary = Rmisc::summarySE(fix.window[fix.window$Lang=='L2',], measurevar='IsFixated', groupvars='Condition', na.rm=T) 

ggplot(fix.window.L2.summary, aes(x = Condition, y = IsFixated, fill = Condition)) +
  geom_bar(stat = "identity") +
  geom_errorbar(aes(ymin=IsFixated-ci, ymax=IsFixated+ci), width=.3) +
  scale_fill_manual('Condition',values=c('red','royalblue','deeppink','darkgrey'))

Now you see why?

Quick Q: Which analysis do you think we should use in this case? Why?

Homework

Write a one-paragraph summary reporting the results of the LME analysis separately for L1 and L2 groups. Report the results from the best model.
Write a one-paragraph summary reporting the results of the GLMM analysis separately for L1 and L2 groups. Report the results from the best model.

For both tasks, remember to report the model structure (what you included in the final model).
If you dropped some variables, make sure to report which variables you dropped and how you decided to drop them.

aineito/stats.VWP documentation built on March 10, 2023, 4:44 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com