knitr::opts_chunk$set(echo = TRUE)
Load the following packages. If you haven't installed them yet, do so first (e.g., install.packages("learnr")
). If you haven't installed the stats.VWP
package (course materials) yet, install it using this command remotes::install_github("aineito/stats.VWP")
.
require(learnr) require(Rmisc) require(tidyverse) require(ggplot2) require(lme4) require(stats.VWP)
We will also use gazer
package for this tutorial. Install and load it using the commands below.
remotes::install_github("dmirman/gazer")
require(gazer)
This is optional: If you run the command below, R will display very large or very small numbers in a plain format (e.g., 100000 instead of 1e+05, 0.000001 instead of 1e-06).
If you like the latter format, skip the command below.
options(scipen=999)
We will use fix.binom.50bin
data in the stats.VWP
package for this tutorial.
Let's look at the summary.
summary(fix.binom.50bin)
To look at the details of the variables, see the help page.
?fix.binom.50bin
We will use the data from -800 ms to 0 ms relative to the target word onset, and drop the 'Jap' condition from Condition
.
Time
between -800 ms and -1 ms (do not include Time 0) and Condition
'Targ', 'Eng' and 'Unr'. Click Run Code
to test your code. Click Solution
to see the solution.
Hint: Use droplevels()
to drop unused levels from factors
Q1 = fix.binom.50bin %>% filter(Time >= -800, Time < 0, Condition %in% c('Targ','Eng','Unr')) %>% droplevels() summary(Q1)
fix.binom.pred = fix.binom.50bin %>% filter(Time >= -800, Time < 0, Condition %in% c('Targ','Eng','Unr')) %>% droplevels()
In R, you want to use an informative name for the data set. Let's use fix.binom.pred
for this tutorial.
We need to create a second-order polynomial based on the Time
values in the range of the selected time.
We will use the code_poly
command from the gazer
package.
poly.order=2
indicates that we need a second-order polynomial. orthogonal=T
indicates that we want the polynomial to be orthogonal (if you set it to FALSE
, you will get natural polynomial). draw.poly=T
indicates that we want to create a graph showing transformed polynomial predictor values. fix.binom.pred = code_poly(df=fix.binom.pred, predictor="Time", poly.order=2, orthogonal=T, draw.poly=T)
If we check the data set again, we can see that we now have poly1
and poly2
columns added to the original data set.
summary(fix.binom.pred)
poly1
corresponds to the linear term, and poly2
corresponds to the quadratic term.
Now, let's code our categorical variables.
We want to use dummy-coding (or treatment-coding) for Condition
because we want to treat the 'Unr' condition as the baseline condition (reference level).
We will use sum-coding for Lang
.
First, let's check the default coding for Condition
. The default coding scheme for R is dummy-coding.
We can check the contrasts by running the command below:
contrasts(fix.binom.pred$Condition) contrasts(fix.binom.pred$Lang)
R treats the first level ('Targ' and 'L1' in our case) as the reference level.
We want to change the matrix so that the 'Unr' condition becomes the reference level.
For sum-coding, we can use contr.sum
function. We put the number 2 because we have 2 levels for Lang
.
contrasts(fix.binom.pred$Condition) = matrix(c(1,0,0,0,1,0),ncol=2) contrasts(fix.binom.pred$Lang) = contr.sum(2)
Now, the contrasts look good.
contrasts(fix.binom.pred$Condition) contrasts(fix.binom.pred$Lang)
Let's doublecheck the data again.
summary(fix.binom.pred)
Now, we're ready to fit the model.
Let's construct a simple random-effects structure model just with the by-subject and by-item intercepts (no random slopes).
This is a GLMM testing an interaction of Condition
by Lang
and effects of Condition
and Lang
. It also includes 2-way interactions of Condition
by poly1
, Condition
by poly2
, Lang
by poly1
and Lang
by poly2
, and 3-way interactions of Condition
by Lang
by poly1
and Condition
by Lang
by poly2
.
Note: Here we are excluding random slopes just to save time. So the results here are partly inconsistent with what is reported in the paper. When you create a model, you would need to choose the best structure for your data.
fix.binom.pred.m.cond.lang = glmer(Count ~ (poly1+poly2) * Condition * Lang + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred)
Before looking at the summary of the model, let's plot the model fit so that we can compare the model output with the graph.
We will add the model fit to the data set we used for the analysis.
The command below will create a new column mfit
.
fix.binom.pred$mfit = fitted(fix.binom.pred.m.cond.lang)
ggplot(fix.binom.pred, aes(Time, Count, color=Condition, lty=Condition)) + facet_wrap(~Lang) + theme_bw() + stat_summary(fun=mean,geom="point") + stat_summary(aes(y=mfit),fun=mean,geom="line",size=1) + stat_summary(fun.data=mean_se, geom="pointrange") + labs(y="Fixation Proportion", x="Time relative to target word onset (ms)") + scale_color_manual('Condition',breaks=c('Targ','Eng','Unr'),labels=c("Target","English competitor","Unrelated"),values=c("red","blue","darkgrey")) + scale_linetype_manual('Condition',breaks=c('Targ','Eng','Unr'),labels=c("Target","English competitor","Unrelated"),values=c("solid","longdash","twodash")) + theme(text=element_text(size=12),legend.key.height=unit(.3,"in"),legend.key.width=unit(.6,"in"))
Good. Now let's check the summary of the model.
summary(fix.binom.pred.m.cond.lang)
At the top, you can find the model's formula and the name of the data set you used.
You can find the effects and interactions of the predictors in the fixed effects section.
Let's look at the effect of Condition
first. In the most left column, you can see Condition1
and Condition2
. The p-values for both (in the Pr(>|z|)
column) are smaller than .05.
question_checkbox("What do they suggest? Hint: Do you remember the contrasts for 'Condition'? (see below)", answer("The target object was significantly more likely to be fixated than the English competitor object, and the English competitor object was significantly more likely to be fixated than the unrelated object"), answer("The target object and the English competitor object were both significantly more likely to be fixated than the unrelated object", correct=T), answer("The fixation proportion difference between the target condition and the unrelated condition was significantly larger than the fixation proportion difference between the English competitor condition and the unrelated condition"), allow_retry = T )
contrasts(fix.binom.pred$Condition)
Now, if we look at the 2-way interactions of Condition
by poly1
and Condition
by poly2
, the p-value for poly1:Condition1
is significant.
This suggests that the effect of condition (Target vs. Unrelated) interacted with the linear term.
If you look at the graph, you can see that the fixation proportion for the Target condition increased over time, whereas the fixation proportion for the Unrelated condition stayed around 0.2 (imagine a linear line on these plots).
This time-course difference was similar in L1 and L2 groups (no 3-way interaction, see poly1:Condition1:Lang1
).
Additionally, there were significant 2-way interactions of Condition1:Lang1
and Condition2:Lang1
on the intercept term and significant 3-way interactions of poly2:Condition1:Lang1
and poly2:Condition2:Lang1
.
We can resolve these interactions by running the model separately for each group.
fix.binom.pred.m.cond.L1 = glmer(Count ~ (poly1+poly2) * Condition + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred[fix.binom.pred$Lang=='L1',]) # model for L1 group summary(fix.binom.pred.m.cond.L1) fix.binom.pred.m.cond.L2 = glmer(Count ~ (poly1+poly2) * Condition + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred[fix.binom.pred$Lang=='L2',]) # model for L2 group summary(fix.binom.pred.m.cond.L2)
We can see that there is a significant effect of Condition1
in both groups, but the effect is larger in the L1 group than in the L2 group. Thus, both L1 and L2 groups were more likely to look at the target object over the unrelated object, but this effect was larger in the L1 group than in the L2 group
The effect of Condition2
is significant in the L1 group but not in the L2 group. Thus, the L1 group was more likely to look at the English competitor object over the unrelated object, but the L2 group was not.
Now, to the 3-way interactions of poly2:Condition1:Lang1
and poly2:Condition2:Lang1
.
The poly2:Condition1
interaction is significant in both groups. That is, the curvature for the Target condition is different from that for the Unrelated condition in both groups. The curvature is shallower for the Unrelated condition in the L1 group, whereas it is shallower for the Target condition in the L2 group.
The poly2:Condition2
interaction is significant in the L1 group. The curvature is shallower for the Unrelated condition compared to the English competitor condition. There was no such difference in the L2 group.
In the previous section, we fitted the model including linear and quadratic terms.
Let's check if the model with linear and quadratic terms is better than the model without a quadratic term.
To do so, we need to construct another model without a quadratic term.
For this illustration, we will use the L1 group's data. As we can see, we just dropped poly2
from the original model.
fix.binom.pred.m.cond.lang.linear = glmer(Count ~ poly1 * Condition * Lang + (1|Subject) + (1|Item), family=binomial, data=fix.binom.pred) summary(fix.binom.pred.m.cond.lang.linear)
Let's plot the model fit again.
We will add the model fit with a new name mfit_linear
.
fix.binom.pred$mfit_linear = fitted(fix.binom.pred.m.cond.lang.linear) ggplot(fix.binom.pred, aes(Time, Count, color=Condition, lty=Condition)) + facet_wrap(~Lang) + theme_bw() + stat_summary(fun=mean,geom="point") + stat_summary(aes(y=mfit_linear),fun=mean,geom="line",size=1) + stat_summary(fun.data=mean_se, geom="pointrange") + labs(y="Fixation Proportion", x="Time relative to target word onset (ms)") + scale_color_manual('Condition',breaks=c('Targ','Eng','Unr'),labels=c("Target","English competitor","Unrelated"),values=c("red","blue","darkgrey")) + scale_linetype_manual('Condition',breaks=c('Targ','Eng','Unr'),labels=c("Target","English competitor","Unrelated"),values=c("solid","longdash","twodash")) + theme(text=element_text(size=12),legend.key.height=unit(.3,"in"),legend.key.width=unit(.6,"in"))
If you compare this graph with the graph plotting a model fit for the model with linear and quadratic terms, you can see that the model fit for the model without a quadratic term looks worse.
Quick Q: How can you tell?
Let's do a model comparison to doublecheck.
We can use the anova()
function to compare two (or more) models.
anova(fix.binom.pred.m.cond.lang, fix.binom.pred.m.cond.lang.linear)
The model comparison result suggests that the model with linear and quadratic terms is significantly better than the model without a quadratic term.
Once you get the results from the best model, you can write a summary of the findings.
There are two rules of thumb for reporting growth curve analysis results (from Dan Mirman's website):
The functional form (third-order orthogonal polynomial), the fixed effects (effect of Condition on all time terms), and the random effects (effect of Subject on each of the time terms and nested effects of Subject-by-Condition on each of the time terms except the cubic). Depending on the circumstances and complexity of the model, you may want to include additional information about the factors and why they were included or not. It's also a good idea to report which method was used for computing p-values.
In some cases the model comparison is going to be enough, but for key findings, the readers should want to see the parameter estimates. The parameter estimate standard errors are critical for interpreting the estimates, so those should be reported as well. The t-values are not critical to report (they are just Estimate divided by the Std Error, so they can always be computed from the reported estimates and standard errors). If there are many estimated parameters, it may be a good idea to focus the main text discussion on the most important ones and report the full set in a table or appendix.
See here for an example.
Identify the best model structure and justify why it is the best model. (Note: In this tutorial, we used the simple model without random slopes, which is probably not the best model.)
Copy and paste the model syntax you used and the fixed effects from the model summary.
Write a one-paragraph report summarising the findings. Report the results from the best model.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.