serbianLex: Serbian lexicon with 1187 prime-target pairs.
In ndl: Naive Discriminative Learning

Description Usage Format References Examples

The 1187 prime-target pairs and their lexical properties used in the simulation study of Experiment 1 of Baayen et al. (2011).

1	data(serbianLex)

A data frame with 1187 observations on the following 14 variables:

Target: A factor specifying the target noun form
Prime: A factor specifying the prime noun form
PrimeLemma: A factor specifying the lemma of the prime
TargetLemma: A factor specifying the target lemma
Length: A numeric vector with the length in letters of the target
WeightedRE: A numeric vector with the weighted relative entropy of the prime and target inflectional paradigms
NormLevenshteinDist: A numeric vector with the normalized Levenshtein distance of prime and target forms
TargetLemmaFreq: A numeric vector with log frequency of the target lemma
PrimeSurfFreq: A numeric vector with log frequency of the prime form
PrimeCondition: A factor with prime conditions, levels: DD, DSSD, SS
CosineSim: A numeric vector with the cosine similarity of prime and target vector space semantics
IsMasc: A vector of logicals, TRUE if the noun is masculine.
TargetGender: A factor with the gender of the target, levels: f, m, and n
TargetCase: A factor specifying the case of the target noun, levels: acc, dat, nom
MeanLogObsRT: Mean log-transformed observed reaction time

Baayen, R. H., Milin, P., Filipovic Durdevic, D., Hendrix, P. and Marelli, M. (2011), An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118, 438-482.

# calculate the weight matrix for the full set of Serbian nouns
data(serbian)
serbian$Cues <- orthoCoding(serbian$WordForm, grams=2)
serbian$Outcomes <- serbian$LemmaCase
sw <- estimateWeights(cuesOutcomes=serbian)

# calculate the meaning activations for all unique word forms

desiredItems <- unique(serbian["Cues"])
desiredItems$Outcomes <- ""
activations <- estimateActivations(desiredItems, sw)$activationMatrix
rownames(activations) <- unique(serbian[["WordForm"]])
activations <- activations + abs(min(activations))
activations[1:5,1:6]

# calculate simulated latencies for the experimental materials

data(serbianLex)
syntax <- c("acc", "dat", "gen", "ins", "loc", "nom", "Pl", "Sg")
we <- 0.4 # compound cue weight
strengths <- rep(0, nrow(serbianLex))
for(i in 1:nrow(serbianLex)) {
   target <- serbianLex$Target[i]
   prime <- serbianLex$Prime[i]
   targetLemma <- as.character(serbianLex$TargetLemma[i])
   primeLemma <- as.character(serbianLex$PrimeLemma[i])
   targetOutcomes <- c(targetLemma, primeLemma, syntax)
   primeOutcomes <- c(targetLemma, primeLemma, syntax)
   p <- activations[target, targetOutcomes]
   q <- activations[prime, primeOutcomes]
   strengths[i] <- sum((q^we)*(p^(1-we)))
}
serbianLex$SimRT <- -strengths
lengthPenalty <- 0.3
serbianLex$SimRT2 <- serbianLex$SimRT + 
  (lengthPenalty * (serbianLex$Length>5))

cor.test(serbianLex$SimRT, serbianLex$MeanLogObsRT)
cor.test(serbianLex$SimRT2, serbianLex$MeanLogObsRT)

serbianLex.lm <- lm(SimRT2 ~ Length +  WeightedRE*IsMasc + 
      NormLevenshteinDist + TargetLemmaFreq + 
      PrimeSurfFreq + PrimeCondition, data=serbianLex)
summary(serbianLex.lm)