poems: Baayen and Milin

Description Format Source References Examples

Description

Data described in Baayen and Milin (2010).

Format

A data frame with 275996 observations on the following 24 variables.

ReadingTime

a numeric vector of self-paced reading times

Subject

a factor with participant identifiers

Sex

a factor with levels m (male) and f (female)

Age

a numeric vector specifying the participant's age

NPoems

a numeric vector of the self-reported maximum number of poems read annually, according to a four-choice question

MultipleChoiceRT

a numeric vector with the response latency to the four-choice question

Trial

a numeric vector specifying the rank of the item in the subject's experimental list

NumberOfWordsIntoLine

a numeric vector specifying the position of the item in the line of poetry being read

PositionBegMidEnd

a factor specifying whether the word was initial beg, medial mid or final end in the sentence

SentenceLength

a numeric vector specifying sentence length

Poem

a factor with as levels identifiers for the poems

Word

a factor with as levels identifiers for the words

WordFrequencyInPoem

a numeric vector specifying the frequency of the word in the poem

RhymeFreqInPoem

a numeric vector specifying the frequency of the word's rhyme in the poem

OnsetFreqInPoem

a numeric vector specifying the frequency of the word's onset in the poem

WordLength

a numeric vector specifying the length of the word in letters

FamilySize

a numeric vector specifying the count of morphological family members

InflectionalEntropy

a numeric vector specifying Shannon's entropy calculated over the probability distribution of a word's inflected variants

LemmaFrequency

a numeric vector specifying the frequency of occurrence of the word in the lemma subsection of the CELEX lexical database

WordFormFrequency

a numeric vector specifying the frequency of occurrence of the word's inflected form in the word form subsection of the CELEX lexical database

NumberOfMeanings

a numeric vector specifying the number of synsets in WordNet in which the word is listed

IsFunctionWord

a factor specifying whether the word is a function word TRUE or not FALSE

HasPunctuationMark

a factor specifying whether the word is followed by a punctuation mark, levels FALSE (absent) and TRUE (present)

NumberOfMorphemes

a numeric vector specifying the scaled number of morphemes in a word

Source

Baayen, R. H. and Milin, P (2010) Analyzing reaction times. International Journal of Psychological Research, 3.2, pp. 12-28.

References

Baayen, R. H. and Milin, P (2010) Analyzing reaction times. International Journal of Psychological Research, 3.2, pp. 12-28.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
data(poems)
par(mfrow=c(2,4))
qqnorm(poems$ReadingTime)
qqnorm(poems$WordFormFrequency)
qqnorm(poems$LemmaFrequency)
qqnorm(poems$FamilySize)
qqnorm(poems$MultipleChoiceRT)
qqnorm(poems$NPoems)
qqnorm(poems$NumberOfMeanings)
poems$LogReadingTime        = log(poems$ReadingTime)
poems$LogWordFormFrequency  = log(poems$WordFormFrequency+1)
poems$LogLemmaFrequency     = log(poems$LemmaFrequency+1)
poems$RecFamilySize         = -100/(poems$FamilySize+1)
poems$LogMultipleChoiceRT   = log(poems$MultipleChoiceRT)
poems$LogNPoems             = log(poems$NPoems)
poems$LogNumberOfMeanings   = log(poems$NumberOfMeanings+1)

## Not run: 

p = poems[,c("Age", "LogNPoems", "LogMultipleChoiceRT", "NumberOfWordsIntoLine", "SentenceLength",
                     "WordFrequencyInPoem", "RhymeFreqInPoem", "OnsetFreqInPoem", "WordLength", 
                     "NumberOfMorphemes",
                     "RecFamilySize", "InflectionalEntropy", "LogLemmaFrequency", "LogWordFormFrequency",
                     "LogNumberOfMeanings")]
pc = prcomp(p,center=TRUE, scale=TRUE)
round(pc$rotation[,1:7],2)
#                        PC1   PC2   PC3   PC4   PC5   PC6   PC7
#Age                    0.00  0.01  0.00  0.03  0.61  0.49 -0.01
#LogNPoems              0.00 -0.01  0.01 -0.01 -0.70 -0.02  0.00
#LogMultipleChoiceRT    0.00  0.00  0.00  0.01 -0.37  0.87 -0.02
#NumberOfWordsIntoLine  0.03 -0.19 -0.39 -0.56  0.01  0.02 -0.05
#SentenceLength        -0.09 -0.20 -0.40 -0.52  0.01  0.01 -0.11
#WordFrequencyInPoem   -0.30 -0.36  0.14  0.11  0.00 -0.01 -0.06
#RhymeFreqInPoem       -0.24 -0.54  0.15  0.07  0.01  0.00  0.11
#OnsetFreqInPoem       -0.20 -0.56  0.14  0.06  0.01  0.00  0.13
#WordLength             0.41 -0.16  0.18 -0.08  0.00  0.00  0.15
#NumberOfMorphemes      0.17 -0.13  0.24 -0.03  0.01 -0.01 -0.83
#RecFamilySize         -0.35  0.20 -0.02 -0.11  0.00  0.01  0.34
#InflectionalEntropy    0.30 -0.19 -0.42  0.36 -0.01 -0.01 -0.02
#LogLemmaFrequency     -0.43  0.13 -0.21  0.18 -0.01 -0.01 -0.27
#LogWordFormFrequency  -0.45  0.16 -0.12  0.10  0.00 -0.01 -0.25
#LogNumberOfMeanings    0.11 -0.15 -0.55  0.44 -0.01 -0.01  0.01


poems$PC1 = pc$x[,1]
poems$PC2 = pc$x[,2]
poems$PC3 = pc$x[,3]
poems$PC4 = pc$x[,4]
poems$PC5 = pc$x[,5]
poems$PC6 = pc$x[,6]
poems$PC7 = pc$x[,7]

library(lme4)
poems.lmer = lmer(LogReadingTime ~ 
  PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 +
  HasPunctuationMark*Sex + Trial + PositionBegMidEnd +
  (1|Poem) + (1|Word) + (1|Subject),
  #(1+LogWordFormFrequency+NumberOfMorphemes|Subject) ,
  data=poems, REML=FALSE)
print(summary(poems.lmer), corr=FALSE)

chf <- diag(c(diag(
  getME(poems.lmer, "Tlist")[[2]]), 
  getME(poems.lmer, "Tlist")[[1]], 
  getME(poems.lmer, "Tlist")[[3]]))
chf[1:3, 1:3] <- getME(poems.lmer, "Tlist")[[2]]             

sv <- svd(chf)
round(sv$d^2/sum(sv$d^2)*100, 1)

## End(Not run)

dmbates/RePsychLing documentation built on May 15, 2019, 9:19 a.m.