rm(list = ls())
knitr::opts_chunk$set(echo = TRUE)

Set-up

Load the following packages. If you haven't installed them yet, do so first (e.g. install.packages("learnr")). If you haven't installed the stats.VWP package (course materials) yet, install it using this command: remotes::install_github("aineito/stats.VWP").

require(learnr)
require(Rmisc)
require(tidyverse)
require(ggplot2)
require(mgcv)
require(itsadug)
require(stats.VWP)

This is optional: If you run the command below, R will display very large or very small numbers in a plain format (e.g. 100000 instead of 1e+05, 0.000001 instead of 1e-06).
If you like the indices format, skip the command below.

options(scipen=999)

Look at the data

We will use fix.50bin data in the stats.VWP package for this tutorial.
The data is from Ito, Pickering & Corley (2018, JML). Let's look at the summary.

summary(fix.50bin)

This data set contains the following data.

Column |Description :-------------|:---------------------------------------------------------- Subject |Subject ID Trial |Trial number Time |Time relative to the target word onset (Time -1000 contains 50 ms from the time -1000 ms) allSample |The sum of all samples in the corresponding time bin Count |Right-eye sample count on the critical object BlinkCount |The total number of right-eye samples that were in a blink event OffScreenCount|The total number of right-eye samples that fall outside of the display boundary (off screen) FixP |Fixation proportion Condition |Condition (Targ=target, Eng=English competitor, Jap=Japanese competitor, Unr=unrelated) Item |Item ID Lang |Language group (L1=native English speakers, L2=native Japanese, non-native English speakers)

The details/description of the variables can also be found in the help page.

?fix.50bin

We will use the target and unrelated conditions. We will drop the rest of the conditions.

fix.50bin = fix.50bin %>% filter(Condition %in% c('Targ','Unr')) %>% droplevels()

We will use the empirical logit elogFix (cf. Barr, 2008, JML) as a dependent variable.

The formula to compute the empirical logit is: log( (Y+.5) / (N-Y+.5) ) where Y is the total number of samples that fall in the critical interest area, and N is the total number of samples for the current bin.

For this data, we will exclude blink samples and off-screen samples (i.e. N = allSample - BlinkCount - OffScreenCount).

We will also mark the first time bin for each trial. We create a new column Is_start and set the value to TRUE for the first time bin.

fix.50bin = fix.50bin %>% mutate(elogFix = log((Count+.5)/(allSample-BlinkCount-OffScreenCount-Count+.5)), Is_start = (Time == min(Time))) 

Let's plot a time-course graph with 'FixP' on the y-axis.

fix.50bin.summary = Rmisc::summarySE(fix.50bin, measurevar='FixP', groupvars=c('Lang','Condition','Time'))
ggplot() + 
  facet_wrap(~Lang) + theme_bw() + 
  geom_line(data=fix.50bin.summary,aes(x=Time, y=FixP, group=Condition, colour=Condition, lty=Condition), lwd=1.5) +
  geom_ribbon(data=fix.50bin.summary,aes(x=Time, ymin=FixP-se, ymax=FixP+se, color=Condition, fill=Condition), lwd=.2, alpha=.3, lty="dashed", show.legend=F)  +
  labs(y="Fixation Proportion", x="Time relative to the target word onset (ms)") + 

  scale_x_continuous(limits=c(-1000,950),expand=c(0,0),breaks=seq(-500, 500, 500)) +
  scale_y_continuous(limits=c(0,1),expand=c(0,0)) +
  scale_color_manual('Condition', values=c("red","darkgrey")) +
  scale_fill_manual('Condition', values=c("red","darkgrey")) +
  scale_linetype_manual('Condition', values=c("solid","dotted")) +

  theme(text=element_text(size=20), legend.key.height=unit(.3,"in"), legend.key.width=unit(.6,"in")) 

GAMM

Here we will run an analysis testing a difference between the target and unrelated conditions for each group.

Subset the data:

fix.L1dat = fix.50bin %>% filter(Lang=="L1") %>% droplevels()
fix.L2dat = fix.50bin %>% filter(Lang=="L2") %>% droplevels()

Code the categorical variable (Condition):

contrasts(fix.L1dat$Condition) = contr.sum(2)
contrasts(fix.L2dat$Condition) = contr.sum(2)

Fit the models

To model non-linear curves, we will use a smooth function s(). The model below includes a random smooth (which adjusts the trend of a numeric predictor (Time) in a non-linear way; "fs" = factor smooth) for Time by Condition, smooth interactions for Time by Subject and Time by Item. by=Condition is used to model potentially different trends over time for different conditions. The random smooths include random intercepts and random slope effects.

A bit more about random effects: s(Time, by=Condition) indicates you want to model different trends over time for different conditions. A by-subject random intercept can be specified using s(Subject, bs="re"), and a by-subject random slope can be specified using s(Subject, Time, bs="re").

We will run the model below first to determine an appropriate value for the AR1 correlation parameter (the parameter to account for autocorrelated residuals).

gamm.base.L1 = mgcv::bam(elogFix ~ Condition + s(Time, by=Condition) + s(Time, Subject, by=Condition, bs="fs", m=1) + s(Time, Item, by=Condition, bs="fs", m=1), data = fix.L1dat)

gamm.base.L2 = mgcv::bam(elogFix ~ Condition + s(Time, by=Condition) + s(Time, Subject, by=Condition, bs="fs", m=1) + s(Time, Item, by=Condition, bs="fs", m=1), data = fix.L2dat)

The warning "model has repeated 1-d smooths of same variable" tells us that we have smooths over time both in random effects and fixed effects. We can ignore this warning.

summary(gamm.base.L1)
summary(gamm.base.L2)
( rho.L1 = itsadug::start_value_rho(gamm.base.L1) )
( rho.L2 = itsadug::start_value_rho(gamm.base.L2) )

The above function gives us a value we can use for the AR1 parameter. We will add these values to the models below (rho=VALUE).

gamm.main.L1 = mgcv::bam(elogFix ~ Condition + s(Time, by=Condition) + s(Time, Subject, by=Condition, bs="fs", m=1) + s(Time, Item, by=Condition, bs="fs", m=1), data=fix.L1dat, rho=rho.L1, AR.start=Is_start)

gamm.main.L2 = mgcv::bam(elogFix ~ Condition + s(Time, by=Condition) + s(Time, Subject, by=Condition, bs="fs", m=1) + s(Time, Item, by=Condition, bs="fs", m=1), data=fix.L2dat, rho=rho.L2, AR.start=Is_start)

Autocorrelation

We can check if accounting for autocorrelation improves the models. The value on the y-axis of the second line (from the left) indicates the amount of autocorrelation at lag 1.

L1 group

itsadug::acf_resid(gamm.base.L1, split_pred=c('Subject', 'Item')) # autocorrelation is not taken into account
itsadug::acf_resid(gamm.main.L1, split_pred=c('Subject', 'Item')) # autocorrelation is taken into account

L2 group

itsadug::acf_resid(gamm.base.L2, split_pred=c('Subject', 'Item')) # autocorrelation is not taken into account
itsadug::acf_resid(gamm.main.L2, split_pred=c('Subject', 'Item')) # autocorrelation is taken into account

Model comparison

itsadug::compareML(gamm.base.L1, gamm.main.L1)
itsadug::compareML(gamm.base.L2, gamm.main.L2)

For both groups, the model with the autocorrelation parameter was better.

Let's get the summaries of the main models.

summary(gamm.main.L1)
summary(gamm.main.L2)

Note: The explained deviance has dropped slightly because the main models are accounting for the autocorrelation and so predicting the observed values slightly less well than the models without the autocorrelation parameter.

Plot results

L1 group

itsadug::plot_smooth(gamm.main.L1, view="Time", plot_all="Condition", v0=0, col=c('red','darkgrey'))

L2 group

itsadug::plot_smooth(gamm.main.L2, view="Time", plot_all="Condition", v0=0, col=c('red','darkgrey'))

We can estimate the time when the effect occurred using the codes below.

itsadug::plot_diff(gamm.main.L1, view="Time", comp=list(Condition=c("Targ", "Unr")))
itsadug::plot_diff(gamm.main.L2, view="Time", comp=list(Condition=c("Targ", "Unr")))

Exercise

In this tutorial, we compared the target vs. unrelated conditions in the L1 group using the code below. This model included a random smooth (which incorporates both random intercepts and random slopes).

Modify the so that you will have a model testing the same effect but with a different random-effects structure. Build a model that only includes by-subject and by-item random intercepts, and another model that includes by-subject and by-item random slope for Time (but no random intercepts).

Remember, random effects are not specified in the same way as in LME. Check the Fit the models section again if you need to review how to specify random effects in GAMM.

Click Solution to see the solution. Don't run the code.

gamm.main.L1 = mgcv::bam(elogFix ~ Condition + s(Time, by=Condition) + s(Time, Subject, by=Condition, bs="fs", m=1) + s(Time, Item, by=Condition, bs="fs", m=1), data=fix.L1dat, rho=rho.L1, AR.start=Is_start)
gamm.main.L1.intercept = mgcv::bam(elogFix ~ Condition + s(Time, by=Condition) + s(Subject, bs="re") + s(Item, bs="re"), data=fix.L1dat, rho=rho.L1, AR.start=Is_start)

gamm.main.L1.slope = mgcv::bam(elogFix ~ Condition + s(Time, by=Condition) + s(Subject, Time, bs="re") + s(Item, Time, bs="re"), data=fix.L1dat, rho=rho.L1, AR.start=Is_start)

Solution

For this exercise, you should have changed the following from the tutorial code:



aineito/stats.VWP documentation built on March 10, 2023, 4:44 p.m.