Description Usage Format Details Source References Examples
The authorship data set is a data.frame of 69 numeric columns and 2 factor columns. Each row contains data giving counts of each word. The last two columns are providing the id of the book and its author.
1 |
A data frame with 840 observations on the following 71 variables.
str.a
a numeric vector that represent the number of times that the string 'a' appeared
str.all
a numeric vector that represent the number of times that the string 'all' appeared
str.also
a numeric vector that represent the number of times that the string 'also' appeared
str.an
a numeric vector that represent the number of times that the string 'an' appeared
str.and
...
str.any
...
str.are
...
str.as
...
str.at
...
str.be
...
str.been
...
str.but
...
str.by
...
str.can
...
str.do
...
str.down
...
str.even
...
str.every
...
str.for
...
str.from
...
str.had
...
str.has
...
str.have
...
str.her
...
str.his
...
str.if
...
str.in
...
str.into
...
str.is
...
str.it
...
str.its
...
str.may
...
str.more
...
str.must
...
str.my
...
str.no
...
str.not
...
str.now
...
str.of
...
str.on
...
str.one
...
str.only
...
str.or
...
str.our
...
str.should
...
str.so
...
str.some
...
str.such
...
str.than
...
str.that
...
str.the
...
str.their
...
str.then
...
str.there
...
str.things
...
str.this
...
str.to
...
str.up
...
str.upon
...
str.was
...
str.were
...
str.what
...
str.when
...
str.which
...
str.who
...
str.will
...
str.with
...
str.would
...
str.your
...
BookID
a factor with levels b1
b2
b3
b4
b5
b6
b7
b8
b9
b10
b11
b12
which are corresponding to the book id
Author
a factor with levels Austen
London
Milton
Shakespeare
which are corresponding to the author id
This dataset is used to illustrate text classification.
Jeffrey S. Simonoff Analyzing Categorical Data
~~ possibly secondary sources and usages ~~
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | authorship2 <- authorship
# transform data in order to have percentage
authorship2[, 1:69] <- authorship2[, 1:69] / rowSums(authorship2[, 1:69] )
# create the model
authorship.som.init <- som ( formula = ~ .
, data = authorship2
, neighborhood = "gaussian"
, grid = grid ( xdim = 20 , ydim = 20 , type = "hexagonal" )
)
# train the network
authorship.som <- learn( authorship.som.init , number.iter = 1000, max.alpha = 0.5, min.alpha = .001, max.rayon = 5 , step.eval.si = 100)
summary(authorship.som)
plot(authorship.som, "energy")
#--- see the distribution on the map
plot(authorship.som, "effectif", cex.label = 0)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.