| regularity | R Documentation |
Regular and irregular Dutch verbs and selected lexical and distributional properties.
data(regularity)
A data frame with 700 observations on the following 13 variables.
Verba factor with the verbs as levels.
WrittenFrequencya numeric vector of logarithmically transformed frequencies in written Dutch (as available in the CELEX lexical database).
NcountStema numeric vector for the number of orthographic neighbors.
VerbalSynsetsa numeric vector for the number of verbal synsets in WordNet.
MeanBigramFrequencya numeric vector for mean log bigram frequency.
InflectionalEntropya numeric vector for Shannon's entropy calculated for the word's inflectional variants.
Auxiliarya factor with levels hebben, zijn and zijnheb for the verb's auxiliary in the perfect tenses.
Regularitya factor with levels irregular and regular.
LengthInLettersa numeric vector of the word's orthographic length.
FamilySizea numeric vector for the number of types in the word's morphological family.
Valencya numeric vector for the verb's valency, estimated by its number of argument structures.
NVratioa numeric vector for the log-transformed ratio of the nominal and verbal frequencies of use.
WrittenSpokenRatioa numeric vector for the log-transformed ratio of the frequencies in written and spoken Dutch.
Baayen, R. H. and Moscoso del Prado Martin, F. (2005) Semantic density and past-tense formation in three Germanic languages, Language, 81, 666-698.
Tabak, W., Schreuder, R. and Baayen, R. H. (2005) Lexical statistics and lexical processing: semantic density, information complexity, sex, and irregularity in Dutch, in Kepser, S. and Reis, M., Linguistic Evidence - Empirical, Theoretical, and Computational Perspectives, Berlin: Mouton de Gruyter, pp. 529-555.
## Not run:
data(regularity)
# ---- predicting regularity with a logistic regression model
library(rms)
regularity.dd = datadist(regularity)
options(datadist = 'regularity.dd')
regularity.lrm = lrm(Regularity ~ WrittenFrequency +
rcs(FamilySize, 3) + NcountStem + InflectionalEntropy +
Auxiliary + Valency + NVratio + WrittenSpokenRatio,
data = regularity, x = TRUE, y = TRUE)
anova(regularity.lrm)
# ---- model validation
validate(regularity.lrm, bw = TRUE, B = 200)
pentrace(regularity.lrm, seq(0, 0.8, by = 0.05))
regularity.lrm.pen = update(regularity.lrm, penalty = 0.6)
regularity.lrm.pen
# ---- a plot of the partial effects
plot(Predict(regularity.lrm.pen))
# predicting regularity with a support vector machine
library(e1071)
regularity$AuxNum = as.numeric(regularity$Auxiliary)
regularity.svm = svm(regularity[, -c(1,8,10)], regularity$Regularity, cross=10)
summary(regularity.svm)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.