knitr::opts_chunk$set(echo = TRUE)

Packages

Load the following packages. If you haven't installed them yet, do so first. (e.g., install.packages("learnr"))

require(learnr)
require(Rmisc)
require(tidyverse)
require(ggplot2)

You can view the help page of a loaded package (or its function) by typing the following:

?tidyverse  # help page of the 'tidyverse' package
?read_delim  # help page of the 'read_delim' function

Note: Everything after # is a comment (not a command)

Install and load a package for this class:

remotes::install_github("aineito/stats.VWP")
require(stats.VWP)

Read data

The codes below assume that the data files are in the 'Data' folder, which is in the folder where your current R script is located. If you have the R script and the data files in the same folder, remove 'Data/'.

fix.window.raw = read_delim("Data/ET_fix_-800_to_0.txt", delim="\t")
fix.50bin.raw = read_delim("Data/ET_fix.txt", delim="\t")

We will first look at the fix.window.raw data.

Have a look at the first 6 rows:

head(fix.window.raw)

Have a look at the data structure:

str(fix.window.raw)

Have a look at the summary:

summary(fix.window.raw)

You can view the entire data by clicking fix.window.raw in the Environment.

The dataset fix.window.raw contains the following information:

Q1:

question_checkbox("Which variables need to be a factor (categorical variable)? Select all that apply:",
  answer("Subject", correct=T),
  answer("Trial"),
  answer("allSample"),
  answer("Count"),
  answer("FixP"),
  answer("Condition", correct=T),
  answer("Item", correct=T),
  answer("Lang", correct=T),
  allow_retry = T
)

Modify data

To change Subject, Condition, Item and Lang to factor, you can run the following:

fix.window.raw = fix.window.raw %>% mutate_at(vars(Subject, Condition, Item, Lang), as.factor)

Note: a shortcut for %>% is 'Ctrl/Cmd + Shift + m'

Q2:

  1. Change Item to character and then to numeric.
  2. Check the summary of the Item column.

Click Run Code to test your code. Click Solution to see the solution.
Hint: Use as.character and as.numeric


fix.window.raw = fix.window.raw %>% mutate_at(vars(Item), as.character) %>% mutate_at(vars(Item), as.numeric)
summary(fix.window.raw$Item)

Let's look at the Condition labels.

summary(fix.window.raw$Condition)

The labels are ordered alphabetically.
Let's reorder them as Targ, Eng, Jap and then Unr.

fix.window.raw = fix.window.raw %>% mutate(Condition = fct_relevel(Condition,c('Targ','Eng','Jap','Unr')))

Note: mutate creates a new variable or modifies an existing variable mutate_at modifies variables selected with a character vector or vars()

Now, the data summary should look like this:

summary(fix.window.raw)

Subset data (base R)

There are different ways to subset data in R. We will first look at subsetting commands using base R.

To get the data from the 3rd row and 2nd column, you can run this:

fix.window.raw[3,2]

Have a look at the first 3 rows of the data to see if the above output matches.

head(fix.window.raw, n=3)

To get the data from the 1st row, you can run this:

fix.window.raw[1,]

To get the data from all but the 1st row, you can run this:

fix.window.raw[-1,]

To get the data from the 1st and 2nd columns, you can run this:

fix.window.raw[,c(1,2)]

You can use column names for subsetting like this:

fix.window.raw[3,c('Subject','Trial')]

You can select the data that meet certain conditions.
For example, to select only the data where Subject column is 'p2':

fix.window.raw[fix.window.raw$Subject == 'p2',]

Quick Q: What is the difference between = and ==?

To select only the data where Subject column is 'p2', 'j2' or 'j7':

fix.window.raw[fix.window.raw$Subject %in% c('p2','j2','j4'),]

Q3:

  1. Select only the data where Subject column is 'j2' or Item column is 10
  2. Select only the data where Trial is 5 and Lang is 'L1'

Click Run Code to test your code. Click Solution to see the solution.
Hint: Use | for OR and & for AND


fix.window.raw[fix.window.raw$Subject=='j2'|fix.window.raw$Item==10,] # Q3.1

fix.window.raw[fix.window.raw$Trial==5&fix.window.raw$Lang=='L1',] # Q3.2

Subset data (tidyverse)

Now we will look at subsetting commands using the tidyverse package.

To select the first 3 rows, you can run this:

fix.window.raw %>% slice(1, 2, 3)

To select data where FixP >.9, you can run this:

fix.window.raw %>% filter(FixP > .9)

To select data from the columns Condition and FixP, you can run this:

fix.window.raw %>% select(Condition,FixP)

To select a range of consecutive variables (e.g., from Subject and FixP), you can run this:

fix.window.raw %>% select(Subject:FixP)

You can exclude the above data by running this:

fix.window.raw %>% select(!(Subject:FixP))

You can also select columns by a character match. The command below will select columns whose name starts with 'C'

fix.window.raw %>% select(starts_with('C'))

You can select the data from all but the column Lang by running this:

fix.window.raw %>% select(-Lang)

The advantage of using this method is that you can apply multiple commands to the same data in one go.
For example:

fix.window.raw %>% select(Condition,FixP) %>% filter(FixP > .9)

Q4:

  1. Select the data where Subject column is 'j2', 'p2' or 'p9' and Condition is 'Targ'.
  2. Select the above data and then drop the column Condition

Click Run Code to test your code. Click Solution to see the solution.


fix.window.raw %>% filter(Subject %in% c('j2','p2','p9'), Condition == 'Targ') # Q4.1

fix.window.raw %>% filter(Subject %in% c('j2','p2','p9'), Condition == 'Targ') %>% select(-Condition) # Q4.2

Note: You can make a line break after (but not before) %>%.

Plot data

First, let's summarise the data for plotting.
To compute mean, SD, SE, and 95% CI for each condition, you can use summarySE function from Rmisc package:

summarySE(fix.window.raw, measurevar = 'FixP', groupvars = 'Condition')

To compute the mean, SD, SE, and 95% CI for each condition and for each Lang group, you can run this:

summarySE(fix.window.raw, measurevar = 'FixP', groupvars = c('Lang','Condition'))

Let's save the second summary as fix.window.raw.summary:

fix.window.raw.summary = summarySE(fix.window.raw, measurevar = 'FixP', groupvars = c('Lang','Condition'))

Now, we can use this summary to plot a graph.
Let's plot a simple bar graph using ggplot from ggplot2 package.
The command below will plot a bar graph showing the mean FixP (on y-axis) for each Condition (on x-axis) and each Lang group. Different colours are assigned for each condition using the fill = Condition command.

ggplot(fix.window.raw.summary, aes(x = Condition, y = FixP, fill = Condition)) +
  geom_bar(stat = "identity") +
  facet_wrap(~Lang) 

We can add error bars representing 95% CIs using geom_errorbar. The width in geom_errorbar specifies the width of the error bar.
We can use ggtitle to add a title.

ggplot(fix.window.raw.summary, aes(x = Condition, y = FixP, fill = Condition)) +
  ggtitle("Fixation proportion with 95% CI") +
  geom_bar(stat = "identity") +
  geom_errorbar(aes(ymin=FixP-ci, ymax=FixP+ci), width=.3) +
  facet_wrap(~Lang) 

Q5:

Plot a bar graph showing the mean FixP (on y-axis) with error bars representing ±1SE for each Condition (on x-axis) just for L1 speakers. Include the title "L1 speakers" in the plot.

Click Run Code to test your code. Click Solution to see the solution.

fix.window.raw.summary = summarySE(fix.window.raw, measurevar = 'FixP', groupvars = c('Lang','Condition'))

ggplot(fix.window.raw.summary[fix.window.raw.summary$Lang=='L1',], aes(x = Condition, y = FixP, fill = Condition)) +
  ggtitle("L1 speakers") +
  geom_bar(stat = "identity") +
  geom_errorbar(aes(ymin=FixP-se, ymax=FixP+se), width=.3)

You can customise many more things. Take a look at this website.

Time-course data

Now, let's look at the fix.50bin.raw data.

This dataset contains the following information:

Have a look at the summary of this dataset:

summary(fix.50bin.raw)

Q6:

  1. Some categorical variables are treated as characters. Change them to factor.
  2. Change the levels of factor Condition. Reorder them as Targ, Eng, Jap and then Unr.
  3. Save the dataset with the above changes as fix.50bin.new and print the summary of the new dataset.

Click Run Code to test your code. Click Solution to see the solution.


fix.50bin.new = fix.50bin.raw %>% mutate_at(vars(Subject, Condition, Item, Lang), as.factor) %>% mutate(Condition = fct_relevel(Condition,c('Targ','Eng','Jap','Unr'))) # Q6.1, Q6.2

summary(fix.50bin.new) # Q6.3

Now, the summary should look like this (the already-modified file is named fix.50bin):

summary(fix.50bin)

Let's plot a time-course graph to look at a fixation proportion change over time.
Like we did in the previous section, we first need to create a summary. We now need to include Time to the grouping variables.

Compute the mean, SD, SE, and 95% CI for each condition, for each Lang group, and for each time window:

summarySE(fix.50bin, measurevar = 'FixP', groupvars = c('Lang','Condition','Time'))

Let's save the summary as fix.50bin.summary:

fix.50bin.summary = summarySE(fix.50bin, measurevar = 'FixP', groupvars = c('Lang','Condition','Time'))

We can use the summary to plot a time-course plot.
We will use a line graph. We want Time on the x-axis and FixP on the y-axis. We want one line per Condition, and we want to use different line colours and line types for each Condition.

ggplot(fix.50bin.summary) +
  geom_line(aes(x=Time, y=FixP, group=Condition, colour=Condition, lty=Condition)) +
  facet_wrap(~Lang) 

Ok, now let's add error bars representing ±1SE. We use geom_ribbon for that. We can additionally specify its size (size), transparency (alpha) and line type (lty). show.legend=F is added to hide a legend (for the error bar).

ggplot(fix.50bin.summary) +
  theme_light() + # use the light theme (so that the background is white)
  geom_line(aes(x=Time, y=FixP, group=Condition, colour=Condition, lty=Condition)) +
  geom_ribbon(aes(x=Time,ymin=FixP-se,ymax=FixP+se,color=Condition,fill=Condition), size=.2, alpha=.3, lty="dashed", show.legend=F)  +
  geom_vline(xintercept = 0, linetype = "solid") + # add a vertical like at time 0
  facet_wrap(~Lang, nrow = 2) # this will place the first plot on top of the other plot

Q7:

  1. Plot a graph like below. Make your graph as similar as possible to this :)

Hint:

ggplot(fix.50bin.summary) +
  theme_light() + 
  xlab("Time relative to target word onset (ms)") +
  ylab('Fixation proportion') +
  geom_line(aes(x=Time, y=FixP, group=Condition, colour=Condition, lty=Condition)) +
  geom_ribbon(aes(x=Time,ymin=FixP-se,ymax=FixP+se,color=Condition,fill=Condition), size=.2, alpha=.3, lty="dashed", show.legend=F)  +
  scale_colour_manual('Condition',labels=c("Target","English competitor","Japanese competitor","Unrelated"),values=c('red','blue','deeppink','darkgrey')) +
  scale_fill_manual('Condition',labels=c("Target","English competitor","Japanese competitor","Unrelated"),values=c('red','blue','deeppink','darkgrey')) +
  scale_linetype_manual('Condition',labels=c("Target","English competitor","Japanese competitor","Unrelated"),values=c('solid','longdash','dotdash','dotted')) +
  scale_y_continuous(limits=c(0,1),expand=c(0,0),breaks=seq(0,1,.25)) +
  theme(text=element_text(size=14)) +
  facet_wrap(~Lang, nrow = 2) 

Click Run Code to test your code. Click Solution to see the solution.

fix.50bin.summary = summarySE(fix.50bin, measurevar = 'FixP', groupvars = c('Lang','Condition','Time'))

ggplot(fix.50bin.summary) +
  theme_light() + 
  xlab("Time relative to target word onset (ms)") +
  ylab('Fixation proportion') +
  geom_line(aes(x=Time, y=FixP, group=Condition, colour=Condition, lty=Condition)) +
  geom_ribbon(aes(x=Time,ymin=FixP-se,ymax=FixP+se,color=Condition,fill=Condition), size=.2, alpha=.3, lty="dashed", show.legend=F)  +
  scale_colour_manual('Condition',labels=c("Target","English competitor","Japanese competitor","Unrelated"),values=c('red','blue','deeppink','darkgrey')) +
  scale_fill_manual('Condition',labels=c("Target","English competitor","Japanese competitor","Unrelated"),values=c('red','blue','deeppink','darkgrey')) +
  scale_linetype_manual('Condition',labels=c("Target","English competitor","Japanese competitor","Unrelated"),values=c('solid','longdash','dotdash','dotted')) +
  scale_y_continuous(limits=c(0,1),expand=c(0,0),breaks=seq(0,1,.25)) +
  theme(text=element_text(size=14)) +
  facet_wrap(~Lang, nrow = 2) 
  1. The time-course graph suggests that L2 speakers were more likely to fixate the English competitor object over the unrelated object in the window from around 500 ms to 1000 ms relative to the target word onset. Plot a simple bar graph showing the mean fixation proportion with error bars representing ±1SE.

fix.L2.late.window.summary = summarySE(fix.50bin[fix.50bin$Lang=='L2'&fix.50bin$Time>=500,], measurevar = 'FixP', groupvars = c('Condition')) # make a summary first

ggplot(fix.L2.late.window.summary, aes(x = Condition, y = FixP, fill = Condition)) +
  geom_bar(stat = "identity") +
  geom_errorbar(aes(ymin=FixP-se, ymax=FixP+se), width=.3)

That's it! We will be using these datasets in the upcoming lab sessions.



aineito/stats.VWP documentation built on March 10, 2023, 4:44 p.m.