knitr::opts_chunk$set(echo = TRUE)
Load the following packages. If you haven't installed them yet, do so first. (e.g., install.packages("learnr")
)
require(learnr) require(Rmisc) require(tidyverse) require(ggplot2)
You can view the help page of a loaded package (or its function) by typing the following:
?tidyverse # help page of the 'tidyverse' package ?read_delim # help page of the 'read_delim' function
Note: Everything after #
is a comment (not a command)
Install and load a package for this class:
remotes::install_github("aineito/stats.VWP")
require(stats.VWP)
The codes below assume that the data files are in the 'Data' folder, which is in the folder where your current R script is located. If you have the R script and the data files in the same folder, remove 'Data/'.
fix.window.raw = read_delim("Data/ET_fix_-800_to_0.txt", delim="\t") fix.50bin.raw = read_delim("Data/ET_fix.txt", delim="\t")
fix.window.raw
data: fixation proportion in a single window from -800 ms to 0 ms relative to the target word onsetfix.50bin.raw
data: fixation proportion in every 50 ms bin from -1000 ms to 1000 ms relative to the target word onsetWe will first look at the fix.window.raw
data.
Have a look at the first 6 rows:
head(fix.window.raw)
Have a look at the data structure:
str(fix.window.raw)
Have a look at the summary:
summary(fix.window.raw)
You can view the entire data by clicking fix.window.raw
in the Environment.
The dataset fix.window.raw
contains the following information:
question_checkbox("Which variables need to be a factor (categorical variable)? Select all that apply:", answer("Subject", correct=T), answer("Trial"), answer("allSample"), answer("Count"), answer("FixP"), answer("Condition", correct=T), answer("Item", correct=T), answer("Lang", correct=T), allow_retry = T )
To change Subject
, Condition
, Item
and Lang
to factor, you can run the following:
fix.window.raw = fix.window.raw %>% mutate_at(vars(Subject, Condition, Item, Lang), as.factor)
Note: a shortcut for %>%
is 'Ctrl/Cmd + Shift + m'
Item
to character and then to numeric. Item
column. Click Run Code
to test your code. Click Solution
to see the solution.
Hint: Use as.character
and as.numeric
fix.window.raw = fix.window.raw %>% mutate_at(vars(Item), as.character) %>% mutate_at(vars(Item), as.numeric) summary(fix.window.raw$Item)
Let's look at the Condition
labels.
summary(fix.window.raw$Condition)
The labels are ordered alphabetically.
Let's reorder them as Targ
, Eng
, Jap
and then Unr
.
fix.window.raw = fix.window.raw %>% mutate(Condition = fct_relevel(Condition,c('Targ','Eng','Jap','Unr')))
Note: mutate
creates a new variable or modifies an existing variable
mutate_at
modifies variables selected with a character vector or vars()
Now, the data summary should look like this:
summary(fix.window.raw)
There are different ways to subset data in R. We will first look at subsetting commands using base R.
To get the data from the 3rd row and 2nd column, you can run this:
fix.window.raw[3,2]
Have a look at the first 3 rows of the data to see if the above output matches.
head(fix.window.raw, n=3)
To get the data from the 1st row, you can run this:
fix.window.raw[1,]
To get the data from all but the 1st row, you can run this:
fix.window.raw[-1,]
To get the data from the 1st and 2nd columns, you can run this:
fix.window.raw[,c(1,2)]
You can use column names for subsetting like this:
fix.window.raw[3,c('Subject','Trial')]
You can select the data that meet certain conditions.
For example, to select only the data where Subject
column is 'p2':
fix.window.raw[fix.window.raw$Subject == 'p2',]
Quick Q: What is the difference between =
and ==
?
To select only the data where Subject
column is 'p2', 'j2' or 'j7':
fix.window.raw[fix.window.raw$Subject %in% c('p2','j2','j4'),]
Subject
column is 'j2' or Item
column is 10Trial
is 5 and Lang
is 'L1'Click Run Code
to test your code. Click Solution
to see the solution.
Hint: Use |
for OR and &
for AND
fix.window.raw[fix.window.raw$Subject=='j2'|fix.window.raw$Item==10,] # Q3.1 fix.window.raw[fix.window.raw$Trial==5&fix.window.raw$Lang=='L1',] # Q3.2
Now we will look at subsetting commands using the tidyverse
package.
To select the first 3 rows, you can run this:
fix.window.raw %>% slice(1, 2, 3)
To select data where FixP
>.9, you can run this:
fix.window.raw %>% filter(FixP > .9)
To select data from the columns Condition
and FixP
, you can run this:
fix.window.raw %>% select(Condition,FixP)
To select a range of consecutive variables (e.g., from Subject
and FixP
), you can run this:
fix.window.raw %>% select(Subject:FixP)
You can exclude the above data by running this:
fix.window.raw %>% select(!(Subject:FixP))
You can also select columns by a character match. The command below will select columns whose name starts with 'C'
fix.window.raw %>% select(starts_with('C'))
You can select the data from all but the column Lang
by running this:
fix.window.raw %>% select(-Lang)
The advantage of using this method is that you can apply multiple commands to the same data in one go.
For example:
fix.window.raw %>% select(Condition,FixP) %>% filter(FixP > .9)
Subject
column is 'j2', 'p2' or 'p9' and Condition
is 'Targ'.Condition
Click Run Code
to test your code. Click Solution
to see the solution.
fix.window.raw %>% filter(Subject %in% c('j2','p2','p9'), Condition == 'Targ') # Q4.1 fix.window.raw %>% filter(Subject %in% c('j2','p2','p9'), Condition == 'Targ') %>% select(-Condition) # Q4.2
Note: You can make a line break after (but not before) %>%
.
First, let's summarise the data for plotting.
To compute mean, SD, SE, and 95% CI for each condition, you can use summarySE
function from Rmisc
package:
summarySE(fix.window.raw, measurevar = 'FixP', groupvars = 'Condition')
To compute the mean, SD, SE, and 95% CI for each condition and for each Lang
group, you can run this:
summarySE(fix.window.raw, measurevar = 'FixP', groupvars = c('Lang','Condition'))
Let's save the second summary as fix.window.raw.summary
:
fix.window.raw.summary = summarySE(fix.window.raw, measurevar = 'FixP', groupvars = c('Lang','Condition'))
Now, we can use this summary to plot a graph.
Let's plot a simple bar graph using ggplot
from ggplot2
package.
The command below will plot a bar graph showing the mean FixP
(on y-axis) for each Condition
(on x-axis) and each Lang
group. Different colours are assigned for each condition using the fill = Condition
command.
ggplot(fix.window.raw.summary, aes(x = Condition, y = FixP, fill = Condition)) + geom_bar(stat = "identity") + facet_wrap(~Lang)
We can add error bars representing 95% CIs using geom_errorbar
. The width
in geom_errorbar
specifies the width of the error bar.
We can use ggtitle
to add a title.
ggplot(fix.window.raw.summary, aes(x = Condition, y = FixP, fill = Condition)) + ggtitle("Fixation proportion with 95% CI") + geom_bar(stat = "identity") + geom_errorbar(aes(ymin=FixP-ci, ymax=FixP+ci), width=.3) + facet_wrap(~Lang)
Plot a bar graph showing the mean FixP
(on y-axis) with error bars representing ±1SE for each Condition
(on x-axis) just for L1 speakers. Include the title "L1 speakers" in the plot.
Click Run Code
to test your code. Click Solution
to see the solution.
fix.window.raw.summary = summarySE(fix.window.raw, measurevar = 'FixP', groupvars = c('Lang','Condition'))
ggplot(fix.window.raw.summary[fix.window.raw.summary$Lang=='L1',], aes(x = Condition, y = FixP, fill = Condition)) + ggtitle("L1 speakers") + geom_bar(stat = "identity") + geom_errorbar(aes(ymin=FixP-se, ymax=FixP+se), width=.3)
You can customise many more things. Take a look at this website.
Now, let's look at the fix.50bin.raw
data.
This dataset contains the following information:
Have a look at the summary of this dataset:
summary(fix.50bin.raw)
Condition
. Reorder them as Targ
, Eng
, Jap
and then Unr
.fix.50bin.new
and print the summary of the new dataset.Click Run Code
to test your code. Click Solution
to see the solution.
fix.50bin.new = fix.50bin.raw %>% mutate_at(vars(Subject, Condition, Item, Lang), as.factor) %>% mutate(Condition = fct_relevel(Condition,c('Targ','Eng','Jap','Unr'))) # Q6.1, Q6.2 summary(fix.50bin.new) # Q6.3
Now, the summary should look like this (the already-modified file is named fix.50bin
):
summary(fix.50bin)
Let's plot a time-course graph to look at a fixation proportion change over time.
Like we did in the previous section, we first need to create a summary. We now need to include Time
to the grouping variables.
Compute the mean, SD, SE, and 95% CI for each condition, for each Lang
group, and for each time window:
summarySE(fix.50bin, measurevar = 'FixP', groupvars = c('Lang','Condition','Time'))
Let's save the summary as fix.50bin.summary
:
fix.50bin.summary = summarySE(fix.50bin, measurevar = 'FixP', groupvars = c('Lang','Condition','Time'))
We can use the summary to plot a time-course plot.
We will use a line graph. We want Time
on the x-axis and FixP
on the y-axis. We want one line per Condition
, and we want to use different line colours and line types for each Condition
.
ggplot(fix.50bin.summary) + geom_line(aes(x=Time, y=FixP, group=Condition, colour=Condition, lty=Condition)) + facet_wrap(~Lang)
Ok, now let's add error bars representing ±1SE. We use geom_ribbon
for that. We can additionally specify its size (size
), transparency (alpha
) and line type (lty
). show.legend=F
is added to hide a legend (for the error bar).
ggplot(fix.50bin.summary) + theme_light() + # use the light theme (so that the background is white) geom_line(aes(x=Time, y=FixP, group=Condition, colour=Condition, lty=Condition)) + geom_ribbon(aes(x=Time,ymin=FixP-se,ymax=FixP+se,color=Condition,fill=Condition), size=.2, alpha=.3, lty="dashed", show.legend=F) + geom_vline(xintercept = 0, linetype = "solid") + # add a vertical like at time 0 facet_wrap(~Lang, nrow = 2) # this will place the first plot on top of the other plot
Hint:
scale_colour_manual
(for modifying line colours), scale_fill_manual
(for modifying error bar colours) and scale_linetype_manual
(for modifying line types).scale_y_continuous
to adjust the y-axis.theme
and element_text
to adjust the text size. I'm using size 14.ggplot(fix.50bin.summary) + theme_light() + xlab("Time relative to target word onset (ms)") + ylab('Fixation proportion') + geom_line(aes(x=Time, y=FixP, group=Condition, colour=Condition, lty=Condition)) + geom_ribbon(aes(x=Time,ymin=FixP-se,ymax=FixP+se,color=Condition,fill=Condition), size=.2, alpha=.3, lty="dashed", show.legend=F) + scale_colour_manual('Condition',labels=c("Target","English competitor","Japanese competitor","Unrelated"),values=c('red','blue','deeppink','darkgrey')) + scale_fill_manual('Condition',labels=c("Target","English competitor","Japanese competitor","Unrelated"),values=c('red','blue','deeppink','darkgrey')) + scale_linetype_manual('Condition',labels=c("Target","English competitor","Japanese competitor","Unrelated"),values=c('solid','longdash','dotdash','dotted')) + scale_y_continuous(limits=c(0,1),expand=c(0,0),breaks=seq(0,1,.25)) + theme(text=element_text(size=14)) + facet_wrap(~Lang, nrow = 2)
Click Run Code
to test your code. Click Solution
to see the solution.
fix.50bin.summary = summarySE(fix.50bin, measurevar = 'FixP', groupvars = c('Lang','Condition','Time'))
ggplot(fix.50bin.summary) + theme_light() + xlab("Time relative to target word onset (ms)") + ylab('Fixation proportion') + geom_line(aes(x=Time, y=FixP, group=Condition, colour=Condition, lty=Condition)) + geom_ribbon(aes(x=Time,ymin=FixP-se,ymax=FixP+se,color=Condition,fill=Condition), size=.2, alpha=.3, lty="dashed", show.legend=F) + scale_colour_manual('Condition',labels=c("Target","English competitor","Japanese competitor","Unrelated"),values=c('red','blue','deeppink','darkgrey')) + scale_fill_manual('Condition',labels=c("Target","English competitor","Japanese competitor","Unrelated"),values=c('red','blue','deeppink','darkgrey')) + scale_linetype_manual('Condition',labels=c("Target","English competitor","Japanese competitor","Unrelated"),values=c('solid','longdash','dotdash','dotted')) + scale_y_continuous(limits=c(0,1),expand=c(0,0),breaks=seq(0,1,.25)) + theme(text=element_text(size=14)) + facet_wrap(~Lang, nrow = 2)
fix.L2.late.window.summary = summarySE(fix.50bin[fix.50bin$Lang=='L2'&fix.50bin$Time>=500,], measurevar = 'FixP', groupvars = c('Condition')) # make a summary first ggplot(fix.L2.late.window.summary, aes(x = Condition, y = FixP, fill = Condition)) + geom_bar(stat = "identity") + geom_errorbar(aes(ymin=FixP-se, ymax=FixP+se), width=.3)
That's it! We will be using these datasets in the upcoming lab sessions.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.