dutchSpeakersDist: Cross-entropy based distances between speakers

Description Usage Format Source References Examples

Description

A distance matrix for the conversations of 165 speakers in the Spoken Dutch Corpus. Metadata on the speakers are available in a separate dataset, dutchSpeakersDistMeta.

Usage

1

Format

A data frame for a 165 by 165 matrix of between-speaker differences.

Source

http://lands.let.kun.nl/cgn/ data collected and analyzed in collaboration with Patrick Juola

References

Juola, P. (2003) The time course of language change, Computers and the Humanities, 37, 77-96.

Juola, P. and Baayen, R. H. (2005) A Controlled-corpus Experiment in Authorship Identification by Cross-entropy, Literary and Linguistic Computing, 20, 59-67.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
	## Not run: 
    data(dutchSpeakersDist)
    dutchSpeakersDist.d = as.dist(dutchSpeakersDist)
    dutchSpeakersDist.mds = cmdscale(dutchSpeakersDist.d, k = 3)

    data(dutchSpeakersDistMeta)
    dat = data.frame(dutchSpeakersDist.mds, 
       Sex = dutchSpeakersDistMeta$Sex, 
       Year = dutchSpeakersDistMeta$AgeYear, 
       EduLevel = dutchSpeakersDistMeta$EduLevel)
    dat = dat[!is.na(dat$Year),]

    par(mfrow=c(1,2))
    plot(dat$Year, dat$X1, xlab="year of birth", 
       ylab = "dimension 1", type = "p")
    lines(lowess(dat$Year, dat$X1))
    boxplot(dat$X3 ~ dat$Sex, ylab = "dimension 3")
    par(mfrow=c(1,1))

    cor.test(dat$X1, dat$Year, method="sp")
    t.test(dat$X3~dat$Sex)
	
## End(Not run)

languageR documentation built on May 30, 2017, 3:09 a.m.