rm(list = ls()) knitr::opts_chunk$set(echo = TRUE)
Load the following packages. If you haven't installed them yet, do so first (e.g. install.packages("learnr")
). If you haven't installed the stats.VWP
package (course materials) yet, install it using this command remotes::install_github("aineito/stats.VWP")
.
require(learnr) require(Rmisc) require(tidyverse) require(ggplot2) require(lme4) require(stats.VWP)
We will also use gazer
package for this tutorial. Install and load it using the commands below.
remotes::install_github("dmirman/gazer")
require(gazer)
This is optional: If you run the command below, R will display very large or very small numbers in a plain format (e.g. 100000 instead of 1e+05, 0.000001 instead of 1e-06).
If you like the indices format, skip the command below.
options(scipen=999)
We will use fix.binom.50bin
data in the stats.VWP
package for this tutorial.
The data is from Ito, Pickering & Corley (2018, JML). Let's look at the summary.
summary(fix.binom.50bin)
This data set contains the following data.
Column |Description :-------------|:---------------------------------------------------------- Subject |Subject ID Trial |Trial number Bin |Time bin ID Time |Time relative to the target word onset (Time -1000 contains 50 ms from the time -1000 ms) Count |Right-eye sample count on the critical object Condition |Condition (Targ=target, Eng=English competitor, Jap=Japanese competitor, Unr=unrelated) Item |Item ID Lang |Language group (L1=native English speakers, L2=native Japanese, non-native English speakers)
The details/description of the variables can also be found in the help page.
?fix.binom.50bin
We will use the data from -800 ms to 0 ms relative to the target word onset, and drop the 'Jap' condition from Condition
.
fix.binom.pred = fix.binom.50bin %>% filter(Time >= -800, Time < 0, Condition %in% c('Targ','Eng','Unr')) %>% droplevels()
In R, you want to use an informative name for the data set. Let's use fix.binom.pred
for this tutorial.
We need to create a second-order polynomial based on the Time
values in the range of the selected time.
We will use the code_poly
command from the gazer
package.
poly.order=2
indicates that we need a second-order polynomial. orthogonal=T
indicates that we want the polynomial to be orthogonal (if you set it to FALSE
, you will get natural polynomial). draw.poly=T
indicates that we want to create a graph showing transformed polynomial predictor values. fix.binom.pred = code_poly(df=fix.binom.pred, predictor="Time", poly.order=2, orthogonal=T, draw.poly=T)
If we check the data set again, we can see that we now have poly1
and poly2
columns added to the original data set.
summary(fix.binom.pred)
poly1
corresponds to the linear term, and poly2
corresponds to the quadratic term.
Now, let's code our categorical variables.
We want to use dummy-coding (or treatment-coding) for Condition
because we want to treat the 'Unr' condition as the baseline condition (reference level).
We will use sum-coding for Lang
.
First, let's check the default coding for Condition
. The default coding scheme for R is dummy-coding.
We can check the contrasts by running the command below:
contrasts(fix.binom.pred$Condition) contrasts(fix.binom.pred$Lang)
R treats the first level (i.e. 'Targ' and 'L1') as the reference level.
We want to change the matrix so that the 'Unr' condition becomes the reference level.
For sum-coding, we can use contr.sum
function. We put the number 2 because we have 2 levels for Lang
.
contrasts(fix.binom.pred$Condition) = matrix(c(1,0,0,0,1,0), ncol=2) contrasts(fix.binom.pred$Lang) = contr.sum(2)
Now, the contrasts look good.
contrasts(fix.binom.pred$Condition) contrasts(fix.binom.pred$Lang)
Let's doublecheck the data again.
summary(fix.binom.pred)
Now, we're ready to fit the model.
Let's construct a simple random-effects structure model just with the by-subject and by-item intercepts (no random slopes).
This is a GLMM testing an interaction of Condition
by Lang
and effects of Condition
and Lang
. It also includes 2-way interactions of Condition
by poly1
, Condition
by poly2
, Lang
by poly1
and Lang
by poly2
, and 3-way interactions of Condition
by Lang
by poly1
and Condition
by Lang
by poly2
.
Note: We will be excluding random slopes here to save time. Hence, the results will be partly inconsistent with what is reported in the paper. When you create a model, you need to choose the best random-effects structure for your design/data.
fix.binom.pred.m.cond.lang = glmer(Count ~ (poly1+poly2) * Condition * Lang + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred)
Before looking at the summary of the model, let's plot the model fit so that we can compare the model output with the graph.
We will add the model fit to the data set we used for the analysis.
The command below will create a new column mfit
.
fix.binom.pred$mfit = fitted(fix.binom.pred.m.cond.lang)
ggplot(fix.binom.pred, aes(Time, Count, color=Condition, lty=Condition)) + facet_wrap(~Lang) + theme_bw() + stat_summary(fun=mean,geom="point") + stat_summary(aes(y=mfit),fun=mean,geom="line") + stat_summary(fun.data=mean_se, geom="pointrange") + labs(y="Fixation Proportion", x="Time relative to target word onset (ms)") + scale_color_manual('Condition',breaks=c('Targ','Eng','Unr'),labels=c("Target","English competitor","Unrelated"),values=c("red","blue","darkgrey")) + scale_linetype_manual('Condition',breaks=c('Targ','Eng','Unr'),labels=c("Target","English competitor","Unrelated"),values=c("solid","longdash","twodash")) + theme(text=element_text(size=12),legend.key.height=unit(.3,"in"),legend.key.width=unit(.6,"in"))
Good. Now let's check the summary of the model.
summary(fix.binom.pred.m.cond.lang)
At the top, you can find the model's formula and the name of the data set you used.
You can find the effects and interactions of the predictors in the fixed effects section.
Let's look at the effect of Condition
first. In the most left column, you can see Condition1
and Condition2
. The p-values for both (in the Pr(>|z|)
column) are smaller than .05.
Quiz:
question_checkbox("What do they suggest? Hint: Do you remember the contrasts for 'Condition'? (see below)", answer("The target object was significantly more likely to be fixated than the English competitor object, and the English competitor object was significantly more likely to be fixated than the unrelated object"), answer("The target object and the English competitor object were both significantly more likely to be fixated than the unrelated object", correct=T), answer("The fixation proportion difference between the target condition and the unrelated condition was significantly larger than the fixation proportion difference between the English competitor condition and the unrelated condition"), allow_retry = T )
contrasts(fix.binom.pred$Condition)
Now, if we look at the 2-way interactions of Condition
by poly1
and Condition
by poly2
, the p-value for poly1:Condition1
is significant.
This suggests that the effect of condition (Target vs. Unrelated) interacts with the linear term.
If you look at the graph, you can see that the fixation proportion for the Target condition increases over time, whereas the fixation proportion for the Unrelated condition stays around 0.2 (imagine a linear line on these plots).
This time-course difference is similar in L1 and L2 groups (no 3-way interaction, see poly1:Condition1:Lang1
).
Additionally, there are significant 2-way interactions of Condition1:Lang1
and Condition2:Lang1
on the intercept term and significant 3-way interactions of poly2:Condition1:Lang1
and poly2:Condition2:Lang1
.
We can resolve these interactions by running the model separately for each group.
fix.binom.pred.m.cond.L1 = glmer(Count ~ (poly1+poly2) * Condition + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred[fix.binom.pred$Lang=='L1',]) # model for L1 group summary(fix.binom.pred.m.cond.L1) fix.binom.pred.m.cond.L2 = glmer(Count ~ (poly1+poly2) * Condition + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred[fix.binom.pred$Lang=='L2',]) # model for L2 group summary(fix.binom.pred.m.cond.L2)
We can see that there is a significant effect of Condition1
in both groups, but the effect is larger in the L1 group than in the L2 group. Thus, both L1 and L2 groups were more likely to look at the target object over the unrelated object, but this difference was larger in the L1 group than in the L2 group.
The effect of Condition2
is significant in the L1 group but not in the L2 group. Thus, the L1 group was more likely to look at the English competitor object over the unrelated object, but the L2 group was not.
Moving on to the 3-way interactions of poly2:Condition1:Lang1
and poly2:Condition2:Lang1
, we see that the poly2:Condition1
interaction is significant in both groups. That is, the curvature for the Target condition is different from that for the Unrelated condition in both groups. The curvature is shallower for the Unrelated condition in the L1 group, whereas it is shallower for the Target condition in the L2 group.
The poly2:Condition2
interaction is significant in the L1 group. The curvature is shallower for the Unrelated condition compared to the English competitor condition. There is no such difference in the L2 group.
In the previous section, we fitted the model with linear and quadratic terms.
Let's check if the model with linear and quadratic terms is better than the model without a quadratic term.
To do so, we need to construct another model without a quadratic term.
For this illustration, we will use the L1 group's data. As we can see, we just dropped poly2
from the original model.
fix.binom.pred.m.cond.lang.linear = glmer(Count ~ poly1 * Condition * Lang + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred) summary(fix.binom.pred.m.cond.lang.linear)
Let's plot the model fit again.
We will add the model fit with a new name mfit_linear
.
fix.binom.pred$mfit_linear = fitted(fix.binom.pred.m.cond.lang.linear) ggplot(fix.binom.pred, aes(Time, Count, color=Condition, lty=Condition)) + facet_wrap(~Lang) + theme_bw() + stat_summary(fun=mean,geom="point") + stat_summary(aes(y=mfit_linear),fun=mean,geom="line") + stat_summary(fun.data=mean_se, geom="pointrange") + labs(y="Fixation Proportion", x="Time relative to target word onset (ms)") + scale_color_manual('Condition',breaks=c('Targ','Eng','Unr'),labels=c("Target","English competitor","Unrelated"),values=c("red","blue","darkgrey")) + scale_linetype_manual('Condition',breaks=c('Targ','Eng','Unr'),labels=c("Target","English competitor","Unrelated"),values=c("solid","longdash","twodash")) + theme(text=element_text(size=12),legend.key.height=unit(.3,"in"),legend.key.width=unit(.6,"in"))
If you compare this graph with the graph plotting a model fit for the model with linear and quadratic terms, you can see that the model fit for the model without a quadratic term looks worse.
Quick Q: How can you tell?
Let's do a model comparison to doublecheck.
We can use the anova()
function to compare two (or more) models.
anova(fix.binom.pred.m.cond.lang, fix.binom.pred.m.cond.lang.linear)
The model comparison result suggests that the model with linear and quadratic terms is significantly better than the model without a quadratic term.
Once you get the results from the best model, you can write a summary of the findings.
There are two rules of thumb for reporting growth curve analysis results (from Dan Mirman's website):
The functional form (third-order orthogonal polynomial), the fixed effects (effect of Condition on all time terms), and the random effects (effect of Subject on each of the time terms and nested effects of Subject-by-Condition on each of the time terms except the cubic). Depending on the circumstances and complexity of the model, you may want to include additional information about the factors and why they were included or not. It's also a good idea to report which method was used for computing p-values.
In some cases the model comparison is going to be enough, but for key findings, the readers should want to see the parameter estimates. The parameter estimate standard errors are critical for interpreting the estimates, so those should be reported as well. The t-values are not critical to report (they are just Estimate divided by the Std Error, so they can always be computed from the reported estimates and standard errors). If there are many estimated parameters, it may be a good idea to focus the main text discussion on the most important ones and report the full set in a table or appendix.
See here for an example.
In this tutorial, our model included linear and quadratic terms, and it looked like this:
fix.binom.pred.m.cond.L1 = glmer(Count ~ (poly1+poly2) * Condition + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred[fix.binom.pred$Lang=='L1',]) # model for L1 group
Let's assume (hypothetically) you have another dependent variable logFix
in the fix.binom.50bin
dataset, which is a log-transformed fixation proportion, and you want to test whether logFix
is significantly different between the target vs. unrelated conditions and between the English competitor vs. unrelated conditions in L1 speakers in the time window from -800 to 200 ms relative to the target word onset. You also want to include a cubic term in the model, in addition to the linear and quadratic terms.
Write a code for this analysis (but don't run it). Click Solution
to see the solution.
fix.pred = fix.binom.50bin %>% filter(Time >= -800, Time < 200, Condition %in% c('Targ','Eng','Unr')) %>% droplevels() fix.pred = code_poly(df=fix.pred, predictor="Time", poly.order=3, orthogonal=T, draw.poly=T) fix.pred.m.cond.L1 = lmer(logFix ~ (poly1+poly2+poly3) * Condition + (1|Subject) + (1|Item), data=fix.pred[fix.pred$Lang=='L1',]) summary(fix.pred.m.cond.L1)
For this exercise, you should have changed the following from the tutorial code:
filter(Time >= -800, Time < 200,
poly.order
specification to include a cubic term poly.order=3
logFix ~ (poly1+poly2+poly3)
glmer
to lmer
and remove family=binomial
because your dependent variable is no longer binomiallogFix
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.