Description Usage Format References Examples
Estimated etymological age for regular and irregular monomorphemic Dutch verbs, together with other distributional predictors of regularity.
1 |
A data frame with 285 observations on the following 14 variables.
Verb
a factor with the verbs as levels.
WrittenFrequency
a numeric vector of logarithmically transformed frequencies in written Dutch (as available in the CELEX lexical database).
NcountStem
a numeric vector for the number of orthographic neighbors.
MeanBigramFrequency
a numeric vector for mean log bigram frequency.
InflectionalEntropy
a numeric vector for Shannon's entropy calculated for the word's inflectional variants.
Auxiliary
a factor with levels hebben
, zijn
and zijnheb
for the verb's auxiliary in the perfect tenses.
Regularity
a factor with levels irregular
and regular
.
LengthInLetters
a numeric vector of the word's orthographic length.
Denominative
a factor with levels Den
and N
specifying
whether a verb is derived from a noun according to the CELEX lexical database.
FamilySize
a numeric vector for the number of types in the word's morphological family.
EtymAge
an ordered factor with levels Dutch
, DutchGerman
, WestGermanic
, Germanic
and IndoEuropean
.
Valency
a numeric vector for the verb's valency, estimated by its number of argument structures.
NVratio
a numeric vector for the log-transformed ratio of the nominal and verbal frequencies of use.
WrittenSpokenRatio
a numeric vector for the log-transformed ratio of the frequencies in written and spoken Dutch.
Baayen, R. H. and Moscoso del Prado Martin, F. (2005) Semantic density and past-tense formation in three Germanic languages, Language, 81, 666-698.
Tabak, W., Schreuder, R. and Baayen, R. H. (2005) Lexical statistics and lexical processing: semantic density, information complexity, sex, and irregularity in Dutch, in Kepser, S. and Reis, M., Linguistic Evidence - Empirical, Theoretical, and Computational Perspectives, Berlin: Mouton de Gruyter, pp. 529-555.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | ## Not run:
data(etymology)
# ---- EtymAge should be an ordered factor, set contrasts accordingly
etymology$EtymAge = ordered(etymology$EtymAge, levels = c("Dutch",
"DutchGerman", "WestGermanic", "Germanic", "IndoEuropean"))
options(contrasts=c("contr.treatment","contr.treatment"))
library(rms)
etymology.dd = datadist(etymology)
options(datadist = 'etymology.dd')
# ---- EtymAge as additional predictor for regularity
etymology.lrm = lrm(Regularity ~ WrittenFrequency +
rcs(FamilySize, 3) + NcountStem + InflectionalEntropy +
Auxiliary + Valency + NVratio + WrittenSpokenRatio + EtymAge,
data = etymology, x = TRUE, y = TRUE)
anova(etymology.lrm)
# ---- EtymAge as dependent variable
etymology.lrm = lrm(EtymAge ~ WrittenFrequency + NcountStem +
MeanBigramFrequency + InflectionalEntropy + Auxiliary +
Regularity + LengthInLetters + Denominative + FamilySize + Valency +
NVratio + WrittenSpokenRatio, data = etymology, x = TRUE, y = TRUE)
# ---- model simplification
etymology.lrm = lrm(EtymAge ~ NcountStem + Regularity + Denominative,
data = etymology, x = TRUE, y = TRUE)
validate(etymology.lrm, bw=TRUE, B=200)
# ---- plot partial effects and check assumptions ordinal regression
plot(Predict(etymology.lrm))
plot(etymology.lrm)
resid(etymology.lrm, 'score.binary', pl = TRUE)
plot.xmean.ordinaly(EtymAge ~ NcountStem, data = etymology)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.