In JackEdTaylor/LexOPS: A Package and Shiny App for Generating Matched Stimuli

The built-in variables of LexOPS are useful but not exhaustive. Thankfully, LexOPS can work with any suitable list of features. For this example, we will join the Lancaster Sensorimotor norms to Engelthaler and Hills' humour ratings, and the in-built LexOPS dataset (LexOPS::lexops). We can then use this to generate stimuli with a visual rating by humour interaction, controlling for length and frequency.

Packages

library(readr)
library(dplyr)
library(ggplot2)
library(LexOPS)

theme_set(theme_minimal())
set.seed(1)

Importing Datsets

Importing Sensorimotor Norms

The Lancaster Sensorimotor Norms are available from the OSF page.

sensorimotor <- read_csv("https://osf.io/48wsc/download")

Let's have a quick peak at the data.

sensorimotor |>
  head(5)

sensorimotor |>
  head(5) |>
  knitr::kable()

Importing Humour Norms

The Humour Norms are available from the Github Page.

humour <- read_csv("https://raw.githubusercontent.com/tomasengelthaler/HumorNorms/master/humor_dataset.csv")

Let's have a look at this data too.

humour |>
  head(5)

humour |>
  head(5) |>
  knitr::kable()

Joining Data Together

Firstly, we'll rename the Word column to have a lowercase "w", so it's consistent with the sensorimotor norms. Then, since all the Lancaster norms' words are in uppercase (whereas the Humour norms are in lowercase), we'll then convert the Lancaster norms words to lowercase.

sensorimotor <- sensorimotor |>
  rename(word = Word) |>
  mutate(word = tolower(word))

Next, we will prefix all the features from the humour norms with "Humour.", so they will be easily identifiable in the final dataset. We can use rename_at() and vars(-word) to add this prefix to all columns except the word column.

humour <- humour |>
  rename_at(vars(-word), ~paste("Humour", .x, sep="."))

Joining the data together is then easy with the dplyr join functions. Here we use full_join(), joining by the common column "word". Finally, we join the data to the lexops in-built dataset, as this contains features we can use to control for length and frequency. Since the words are stored in lexops in the string column, we tell left_join() that these columns should be treated as the same thing, with c("word"="string").

sens_hum <- full_join(sensorimotor, humour, by="word") |>
  left_join(lexops, by=c("word"="string"))

Generating Stimuli

Before we choose boundaries for our splits, we want to check the distributions of our independent variables.

sens_hum |> ggplot(aes(Visual.mean)) + geom_density()
sens_hum |> ggplot(aes(Humour.mean)) + geom_density()
sens_hum |> ggplot(aes(Visual.mean, Humour.mean)) + geom_point(alpha=0.5)

Finally, we can generate stimuli with our new words. We will create two levels of Visual ratings: 0:2 (low) and 3.5:5 (high), and two levels of Humour ratings: 2:2.5 (neutral, as consistently low humour ratings are often tabboo) and 3:5 (high). We'll control for word length exactly, and word frequency within a tolerance of -0.2:0.2.

Since we're using our own data, we need to use the set_options() function to tell LexOPS which column contains our unique identifier, i.e., our words (id_col = "word").

stim <- sens_hum |>
  set_options(id_col = "word") |>
  split_by(Visual.mean, 0:2 ~ 3.5:5) |>
  split_by(Humour.mean, 2:2.5 ~ 3:5) |>
  control_for(Length, 0:0) |>
  control_for(Zipf.SUBTLEX_UK, -0.2:0.2) |>
  generate(25)

We can view a quick summary of our stimuli with the plot_design() function.

plot_design(stim)

Here is the list of stimuli generated for the design of visual sensorimotor ratings (A: A1 low, A2 high) by humour ratings (B: B1 low, B2 high), controlling for word length and frequency.

print(stim)

knitr::kable(stim)

Citing Sources

The cite_design() function is useful for suggesting papers that you should cite having generated your stimuli. Note that for variables LexOPS does not know, while the variable will be suggested as something that needs citing, you will have to find the citation yourself.