spam7 | R Documentation |
The data consist of 4601 email items, of which 1813 items were identified as spam. This is a subset of the full dataset, with six only of the 57 explanatory variables in the complete dataset.
spam7
Columns included are:
total length of uninterrupted sequences of capitals
Occurrences of ‘$’, as percent of total number of characters
Occurrences of ‘!’, as percent of total number of characters
Occurrences of ‘money’, as percent of total number of words
Occurrences of the string ‘000’, as percent of total number of words
Occurrences of ‘make’, as % of total number of words
outcome variable, a factor with levels
n
not spam,
y
spam
George Forman, Hewlett-Packard Laboratories
The complete dataset, and documentation, are available from Spam database
require(rpart)
spam.rpart <- rpart(formula = yesno ~ crl.tot + dollar + bang +
money + n000 + make, data=spam7)
plot(spam.rpart)
text(spam.rpart)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.