spam7: Spam E-mail Data

Description Usage Format Source Examples

Description

The data consist of 4601 email items, of which 1813 items were identified as spam. This is a subset of the full dataset, with six only of the 57 explanatory variables in the complete dataset.

Usage

1

Format

Columns included are:

crl.tot

total length of words in capitals

dollar

number of occurrences of the \$ symbol

bang

number of occurrences of the ! symbol

money

number of occurrences of the word ‘money’

n000

number of occurrences of the string ‘000’

make

number of occurrences of the word ‘make’

yesno

outcome variable, a factor with levels n not spam, y spam

Source

George Forman, Hewlett-Packard Laboratories

The complete dataset, and documentation, are available from Spam database

Examples

1
2
3
4
5
require(rpart)
spam.rpart <- rpart(formula = yesno ~ crl.tot + dollar + bang +
   money + n000 + make, data=spam7)
plot(spam.rpart)
text(spam.rpart)

jhmaindonald/DAAG documentation built on May 3, 2019, 3:13 p.m.