poems: Baayen and Milin
In dmbates/RePsychLing: Data sets from Psychology and Linguistics experiments

Description Format Source References Examples

Data described in Baayen and Milin (2010).

A data frame with 275996 observations on the following 24 variables.

ReadingTime: a numeric vector of self-paced reading times
Subject: a factor with participant identifiers
Sex: a factor with levels m (male) and f (female)
Age: a numeric vector specifying the participant's age
NPoems: a numeric vector of the self-reported maximum number of poems read annually, according to a four-choice question
MultipleChoiceRT: a numeric vector with the response latency to the four-choice question
Trial: a numeric vector specifying the rank of the item in the subject's experimental list
NumberOfWordsIntoLine: a numeric vector specifying the position of the item in the line of poetry being read
PositionBegMidEnd: a factor specifying whether the word was initial beg, medial mid or final end in the sentence
SentenceLength: a numeric vector specifying sentence length
Poem: a factor with as levels identifiers for the poems
Word: a factor with as levels identifiers for the words
WordFrequencyInPoem: a numeric vector specifying the frequency of the word in the poem
RhymeFreqInPoem: a numeric vector specifying the frequency of the word's rhyme in the poem
OnsetFreqInPoem: a numeric vector specifying the frequency of the word's onset in the poem
WordLength: a numeric vector specifying the length of the word in letters
FamilySize: a numeric vector specifying the count of morphological family members
InflectionalEntropy: a numeric vector specifying Shannon's entropy calculated over the probability distribution of a word's inflected variants
LemmaFrequency: a numeric vector specifying the frequency of occurrence of the word in the lemma subsection of the CELEX lexical database
WordFormFrequency: a numeric vector specifying the frequency of occurrence of the word's inflected form in the word form subsection of the CELEX lexical database
NumberOfMeanings: a numeric vector specifying the number of synsets in WordNet in which the word is listed
IsFunctionWord: a factor specifying whether the word is a function word TRUE or not FALSE
HasPunctuationMark: a factor specifying whether the word is followed by a punctuation mark, levels FALSE (absent) and TRUE (present)
NumberOfMorphemes: a numeric vector specifying the scaled number of morphemes in a word

Baayen, R. H. and Milin, P (2010) Analyzing reaction times. International Journal of Psychological Research, 3.2, pp. 12-28.

data(poems)
par(mfrow=c(2,4))
qqnorm(poems$ReadingTime)
qqnorm(poems$WordFormFrequency)
qqnorm(poems$LemmaFrequency)
qqnorm(poems$FamilySize)
qqnorm(poems$MultipleChoiceRT)
qqnorm(poems$NPoems)
qqnorm(poems$NumberOfMeanings)
poems$LogReadingTime        = log(poems$ReadingTime)
poems$LogWordFormFrequency  = log(poems$WordFormFrequency+1)
poems$LogLemmaFrequency     = log(poems$LemmaFrequency+1)
poems$RecFamilySize         = -100/(poems$FamilySize+1)
poems$LogMultipleChoiceRT   = log(poems$MultipleChoiceRT)
poems$LogNPoems             = log(poems$NPoems)
poems$LogNumberOfMeanings   = log(poems$NumberOfMeanings+1)

## Not run: 

p = poems[,c("Age", "LogNPoems", "LogMultipleChoiceRT", "NumberOfWordsIntoLine", "SentenceLength",
                     "WordFrequencyInPoem", "RhymeFreqInPoem", "OnsetFreqInPoem", "WordLength", 
                     "NumberOfMorphemes",
                     "RecFamilySize", "InflectionalEntropy", "LogLemmaFrequency", "LogWordFormFrequency",
                     "LogNumberOfMeanings")]
pc = prcomp(p,center=TRUE, scale=TRUE)
round(pc$rotation[,1:7],2)
#                        PC1   PC2   PC3   PC4   PC5   PC6   PC7
#Age                    0.00  0.01  0.00  0.03  0.61  0.49 -0.01
#LogNPoems              0.00 -0.01  0.01 -0.01 -0.70 -0.02  0.00
#LogMultipleChoiceRT    0.00  0.00  0.00  0.01 -0.37  0.87 -0.02
#NumberOfWordsIntoLine  0.03 -0.19 -0.39 -0.56  0.01  0.02 -0.05
#SentenceLength        -0.09 -0.20 -0.40 -0.52  0.01  0.01 -0.11
#WordFrequencyInPoem   -0.30 -0.36  0.14  0.11  0.00 -0.01 -0.06
#RhymeFreqInPoem       -0.24 -0.54  0.15  0.07  0.01  0.00  0.11
#OnsetFreqInPoem       -0.20 -0.56  0.14  0.06  0.01  0.00  0.13
#WordLength             0.41 -0.16  0.18 -0.08  0.00  0.00  0.15
#NumberOfMorphemes      0.17 -0.13  0.24 -0.03  0.01 -0.01 -0.83
#RecFamilySize         -0.35  0.20 -0.02 -0.11  0.00  0.01  0.34
#InflectionalEntropy    0.30 -0.19 -0.42  0.36 -0.01 -0.01 -0.02
#LogLemmaFrequency     -0.43  0.13 -0.21  0.18 -0.01 -0.01 -0.27
#LogWordFormFrequency  -0.45  0.16 -0.12  0.10  0.00 -0.01 -0.25
#LogNumberOfMeanings    0.11 -0.15 -0.55  0.44 -0.01 -0.01  0.01


poems$PC1 = pc$x[,1]
poems$PC2 = pc$x[,2]
poems$PC3 = pc$x[,3]
poems$PC4 = pc$x[,4]
poems$PC5 = pc$x[,5]
poems$PC6 = pc$x[,6]
poems$PC7 = pc$x[,7]

library(lme4)
poems.lmer = lmer(LogReadingTime ~ 
  PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 +
  HasPunctuationMark*Sex + Trial + PositionBegMidEnd +
  (1|Poem) + (1|Word) + (1|Subject),
  #(1+LogWordFormFrequency+NumberOfMorphemes|Subject) ,
  data=poems, REML=FALSE)
print(summary(poems.lmer), corr=FALSE)

chf <- diag(c(diag(
  getME(poems.lmer, "Tlist")[[2]]), 
  getME(poems.lmer, "Tlist")[[1]], 
  getME(poems.lmer, "Tlist")[[3]]))
chf[1:3, 1:3] <- getME(poems.lmer, "Tlist")[[2]]             

sv <- svd(chf)
round(sv$d^2/sum(sv$d^2)*100, 1)

## End(Not run)