df_jaeger14: Self-Paced Reading Dataset on Chinese Relative Clauses
In pangoling: Access to Large Language Model Predictions

df_jaeger14

R Documentation

Self-Paced Reading Dataset on Chinese Relative Clauses

Description

This dataset contains data from a self-paced reading experiment on Chinese relative clause comprehension. It is structured to support analysis of reaction times, comprehension accuracy, and surprisal values across various experimental conditions in a 2x2 fully crossed factorial design:

Usage

data(df_jaeger14)

Format

A tibble with 8,624 rows and 15 variables:

subject: Participant identifier, a character vector.
item: Trial item number, an integer.
cond: Experimental condition, a character vector indicating variations in sentence structure (e.g., "a", "b", "c", "d").
word: Chinese word presented in each trial, a character vector.
wordn: Position of the word within the sentence, an integer.
rt: Reaction time in milliseconds for reading each word, an integer.
region: Sentence region or phrase type (e.g., "hd1", "Det+CL"), a character vector.
question: Comprehension question associated with the trial, a character vector.
accuracy: Binary accuracy score for the comprehension question (1 = correct, 0 = incorrect).
correct_answer: Expected correct answer for the comprehension question, a character vector ("Y" or "N").
question_type: Type of comprehension question, a character vector.
experiment: Name of the experiment, indicating self-paced reading, a character vector.
list: Experimental list number, for counterbalancing item presentation, an integer.
sentence: Full sentence used in the trial with words marked for analysis, a character vector.
surprisal: Model-derived surprisal values for each word, a numeric vector.

Region codes in the dataset (column region):

N: Main clause subject (in object-modifications only)
V: Main clause verb (in object-modifications only)
Det+CL: Determiner+classifier
Adv: Adverb
VN: RC-verb+RC-object (subject relatives) or RC-subject+RC-verb (object relatives)
- Note: These two words were merged into one region after the experiment; they were presented as separate regions during the experiment.
FreqP: Frequency phrase/durational phrase
DE: Relativizer "de"
head: Relative clause head noun
hd1: First word after the head noun
hd2: Second word after the head noun
hd3: Third word after the head noun
hd4: Fourth word after the head noun (only in subject-modifications)
hd5: Fifth word after the head noun (only in subject-modifications)

Notes on reading times (column rt):

The reading time of the relative clause region (e.g., "V-N" or "N-V") was computed by summing up the reading times of the relative clause verb and noun.
The verb and noun were presented as two separate regions during the experiment.

Details

Factor I: Modification type (subject modification; object modification)
Factor II: Relative clause type (subject relative; object relative)

Condition labels:

a) subject modification; subject relative
b) subject modification; object relative
c) object modification; subject relative
d) object modification; object relative

Source

Jäger, L., Chen, Z., Li, Q., Lin, C.-J. C., & Vasishth, S. (2015). The subject-relative advantage in Chinese: Evidence for expectation-based processing. Journal of Memory and Language, 79–80, 97-120. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jml.2014.10.005")}

Examples

# Basic exploration
head(df_jaeger14)

# Summarize reaction times by region
 library(tidytable)
df_jaeger14 |>
  group_by(region) |>
  summarize(mean_rt = mean(rt, na.rm = TRUE))

pangoling documentation built on April 11, 2025, 6:16 p.m.