ndl.package: Naive Discriminative Learning

Description Details Author(s) References Examples

Description

Naive discriminative learning implements learning and classification models based on the Rescorla-Wagner equations and their equilibrium equations.

Naive discriminative learning implements classification models based on the Rescorla-Wagner equations and the equilibrium equations of the Rescorla-Wagner equations. This package provides three kinds of functionality: (1) discriminative learning based directly on the Rescorla-Wagner equations, (2) a function implementing the naive discriminative reader, and a model for silent (single-word) reading, and (3) a classifier based on the equilibrium equations. The functions and datasets for the naive discriminative reader model make it possible to replicate the simulation results for Experiment 1 of Baayen et al. (2011). The classifier is provided to allow for comparisons between machine learning (svm, TiMBL, glm, random forests, etc.) and discrimination learning. Compared to standard classification algorithms, naive discriminative learning may overfit the data, albeit gracefully.

Details

The DESCRIPTION file: This package was not yet installed at build time.

Index: This package was not yet installed at build time.

For more detailed information on the core Rescorla-Wagner equations, see the functions RescorlaWagner and plot.RescorlaWagner, as well as the data sets danks, numbers (data courtesy of Michael Ramscar), and lexample (an example discussed in Baayen et al. 2011).

The functions for the naive discriminative learning (at the user level) are estimateWeights and estimateActivations. The relevant data sets are serbian, serbianUniCyr,serbianUniLat, and serbianLex. The examples for serbianLex present the full simulation for Experiment 1 of Baayen et al. (2011).

Key functionality for the user is provided by the functions orthoCoding, estimateWeights, and estimateActivations. orthoCoding calculates the letter n-grams for character strings, to be used as cues. It is assumed that meaning or meanings (separated by underscores if there are more then one) are available as outcomes. The frequency with which each (unique) combination of cues and outcomes occurs are required. For some example input data sets, see: danks, plurals, serbian, serbianUniCyr and serbianUniLat.

The function estimateWeights estimates the association strengths of cues to outcomes, using the equilibrium equations presented in Danks (2003). The function estimateActivations estimates the activations of outcomes (meanings) given cues (n-grams).

The Rcpp-based learn and learnLegacy functions use a C++ function to compute the conditional co-occurrence matrices required in the equilibrium equations. These are internally used by estimateWeights and should not be used directly by users of the package.

The key function for naive discriminative classification is ndlClassify; see data sets think and dative for examples.

Author(s)

Antti Arppe [aut], Peter Hendrix [aut], Petar Milin [aut], R. Harald Baayen [aut], Tino Sering [aut, cre], Cyrus Shaoul [aut]

Maintainer: Tino Sering <konstantin.sering@uni-tuebingen.de>

Author Contributions: Initial concept by R. Harald Baayen with contributions from Petar Milin and Peter Hendrix. First R coding done by R. Harald Baayen.

Initial R package development until version 0.1.6 by Antti Arppe. Initial documentation by Antti Arppe. Initial optimizations in C by Petar Milin and Antti Arppe.

Classification functionality developed further by Antti Arppe.

In version 0.2.14 to version 0.2.16, improvements to the NDL algorithm by Petar Milin and Cyrus Shaoul. In version 0.2.14 to version 0.2.16, improved performance optimizations (C++ and Rcpp) by Cyrus Shaoul.

From version 0.2.17 onwards bug fixes and cran compliance by Tino Sering.

References

Baayen, R. H. and Milin, P. and Filipovic Durdevic, D. and Hendrix, P. and Marelli, M., An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118, 438-482.

Baayen, R. H. (2011) Corpus linguistics and naive discriminative learning. Brazilian Journal of Applied Linguistics, 11, 295-328.

Arppe, A. and Baayen, R. H. (in prep.) Statistical classification and principles of human learning.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
## Not run: 
# Rescorla-Wagner
data(lexample)

lexample$Cues <- orthoCoding(lexample$Word, grams=1)
lexample.rw <- RescorlaWagner(lexample, nruns=25, traceCue="h",
   traceOutcome="hand")
plot(lexample.rw)
mtext("h - hand", 3, 1)

data(numbers)

traceCues <- c( "exactly1", "exactly2", "exactly3", "exactly4", "exactly5",
   "exactly6", "exactly7", "exactly10", "exactly15")
traceOutcomes <- c("1", "2", "3", "4", "5", "6", "7", "10", "15")

ylimit <- c(0,1)
par(mfrow=c(3,3), mar=c(4,4,1,1))

for (i in 1:length(traceCues)) {
  numbers.rw <- RescorlaWagner(numbers, nruns=1, traceCue=traceCues[i],
     traceOutcome=traceOutcomes[i])
  plot(numbers.rw, ylimit=ylimit)
  mtext(paste(traceCues[i], " - ", traceOutcomes[i], sep=""), side=3, line=-1,
    cex=0.7)
}
par(mfrow=c(1,1))

# naive discriminative learning (for complete example, see serbianLex)
# This function uses a Unicode dataset.
data(serbianUniCyr)
serbianUniCyr$Cues <- orthoCoding(serbianUniCyr$WordForm, grams=2)
serbianUniCyr$Outcomes <- serbianUniCyr$LemmaCase
sw <- estimateWeights(cuesOutcomes=serbianUniCyr,hasUnicode=T)

desiredItems <- unique(serbianUniCyr["Cues"])
desiredItems$Outcomes=""
activations <- estimateActivations(desiredItems, sw)$activationMatrix
rownames(activations) <- unique(serbianUniCyr[["WordForm"]])

syntax <- c("acc", "dat", "gen", "ins", "loc", "nom", "Pl",  "Sg") 
activations2 <- activations[,!is.element(colnames(activations), syntax)]
head(rownames(activations2),50)
head(colnames(activations2),8)

image(activations2, xlab="word forms", ylab="meanings", xaxt="n", yaxt="n")
mtext(c("yena", "...", "zvuke"), side=1, line=1, at=c(0, 0.5, 1),  adj=c(0,0,1))
mtext(c("yena", "...", "zvuk"), side=2, line=1, at=c(0, 0.5, 1),   adj=c(0,0,1))

# naive discriminative classification
data(think)
think.ndl <- ndlClassify(Lexeme ~ Person + Number + Agent + Patient + Register,
   data=think)
summary(think.ndl)
plot(think.ndl, values="weights", type="hist", panes="multiple")
plot(think.ndl, values="probabilities", type="density")

## End(Not run)

Example output

This is ndl version 0.2.18. 
For an overview of the package, type 'help("ndl.package")'.
 [1] "<U+0436><U+0435><U+043D><U+0430>"                
 [2] "<U+0436><U+0435><U+043D><U+0435>"                
 [3] "<U+0436><U+0435><U+043D><U+0438>"                
 [4] "<U+0436><U+0435><U+043D><U+0443>"                
 [5] "<U+0436><U+0435><U+043D><U+043E><U+043C>"        
 [6] "<U+0436><U+0435><U+043D><U+0430><U+043C><U+0430>"
 [7] "<U+0436><U+0435><U+0459><U+0430>"                
 [8] "<U+0436><U+0435><U+0459><U+0435>"                
 [9] "<U+0436><U+0435><U+0459><U+0438>"                
[10] "<U+0436><U+0435><U+0459><U+0443>"                
[11] "<U+0436><U+0435><U+0459><U+043E><U+043C>"        
[12] "<U+0436><U+0435><U+0459><U+0430><U+043C><U+0430>"
[13] "<U+0436><U+0438><U+0432><U+043E><U+0442>"        
[14] "<U+0436><U+0438><U+0432><U+043E><U+0442><U+0430>"
[15] "<U+0436><U+0438><U+0432><U+043E><U+0442><U+0443>"
[16] "<U+0436><U+0438><U+0432><U+043E><U+0442><U+043E><U+043C>"
[17] "<U+0436><U+0438><U+0432><U+043E><U+0442><U+0438>"
[18] "<U+0436><U+0438><U+0432><U+043E><U+0442><U+0438><U+043C><U+0430>"
[19] "<U+0436><U+0438><U+0432><U+043E><U+0442><U+0435>"
[20] "<U+0448><U+0435><U+0442><U+045A><U+0430>"        
[21] "<U+0448><U+0435><U+0442><U+045A><U+0435>"        
[22] "<U+0448><U+0435><U+0442><U+045A><U+0438>"        
[23] "<U+0448><U+0435><U+0442><U+045A><U+0443>"        
[24] "<U+0448><U+0435><U+0442><U+045A><U+043E><U+043C>"
[25] "<U+0448><U+0435><U+0442><U+045A><U+0430><U+043C><U+0430>"
[26] "<U+0448><U+0438><U+0440><U+0438><U+043D><U+0430>"
[27] "<U+0448><U+0438><U+0440><U+0438><U+043D><U+0435>"
[28] "<U+0448><U+0438><U+0440><U+0438><U+043D><U+0438>"
[29] "<U+0448><U+0438><U+0440><U+0438><U+043D><U+0443>"
[30] "<U+0448><U+0438><U+0440><U+0438><U+043D><U+043E><U+043C>"
[31] "<U+0448><U+0438><U+0440><U+0438><U+043D><U+0430><U+043C><U+0430>"
[32] "<U+0448><U+043A><U+043E><U+043B><U+0430>"        
[33] "<U+0448><U+043A><U+043E><U+043B><U+0435>"        
[34] "<U+0448><U+043A><U+043E><U+043B><U+0438>"        
[35] "<U+0448><U+043A><U+043E><U+043B><U+0443>"        
[36] "<U+0448><U+043A><U+043E><U+043B><U+043E><U+043C>"
[37] "<U+0448><U+043A><U+043E><U+043B><U+0430><U+043C><U+0430>"
[38] "<U+0448><U+0443><U+043C><U+0430>"                
[39] "<U+0448><U+0443><U+043C><U+0435>"                
[40] "<U+0448><U+0443><U+043C><U+0438>"                
[41] "<U+0448><U+0443><U+043C><U+0443>"                
[42] "<U+0448><U+0443><U+043C><U+043E><U+043C>"        
[43] "<U+0448><U+0443><U+043C><U+0430><U+043C><U+0430>"
[44] "<U+0447><U+0430><U+043C><U+0430><U+0446>"        
[45] "<U+0447><U+0430><U+043C><U+0446><U+0430>"        
[46] "<U+0447><U+0430><U+043C><U+0446><U+0443>"        
[47] "<U+0447><U+0430><U+043C><U+0446><U+0435><U+043C>"
[48] "<U+0447><U+0430><U+043C><U+0446><U+0438>"        
[49] "<U+0447><U+0430><U+043C><U+0430><U+0446><U+0430>"
[50] "<U+0447><U+0430><U+043C><U+0446><U+0438><U+043C><U+0430>"
[1] "\320\260\320\272\320\260\320\264\320\265\320\274\320\270\321\230\320\260"
[2] "\320\260\320\277\320\260\321\200\320\260\321\202"                        
[3] "\320\261\320\270\321\202\320\272\320\260"                                
[4] "\320\261\320\276\320\263"                                                
[5] "\320\261\320\276\320\273"                                                
[6] "\320\261\320\276\321\200"                                                
[7] "\320\261\320\276\321\200\320\260\321\206"                                
[8] "\320\261\320\276\321\230\320\260"                                        

Call:
ndlClassify(formula = Lexeme ~ Person + Number + Agent + Patient + 
    Register, data = think)

Formula:
Lexeme ~ Person + Number + Agent + Patient + Register

Weights:
                       ajatella    harkita    miettia     pohtia
AgentGroup           -0.0024238  0.0484437 -0.0195531  0.1726282
AgentIndividual       0.1377378  0.0011873  0.0730958 -0.0129260
AgentNone             0.1481698  0.0115980  0.0540880 -0.0147608
NumberOther           0.1240703  0.0326252  0.0936144  0.0483326
NumberPlural          0.1594136  0.0286038  0.0140163  0.0966088
PatientAbstraction   -0.1868402 -0.0124441  0.0874109  0.1661720
PatientActivity      -0.2692613  0.3313997 -0.0140782  0.0062384
PatientCommunication -0.3390196  0.0820339  0.2322344  0.0790500
PatientDirectQuote   -0.3940583 -0.1073144  0.1906831  0.3649883
PatientEvent          0.2320365 -0.0692502  0.0007568 -0.1092445
... [ omitted 12 rows ] ...

Null deviance:              8701  on  13616  degrees of freedom
Residual (model) deviance:  7335  on  13528  degrees of freedom

R2.likelihood:  0.157             
AIC:            7511              
BIC:            8051             

ndl documentation built on May 2, 2019, 10:28 a.m.