Emails_train: Email Train
In mdsr-book/mdsr: Complement to 'Modern Data Science with R'

Emails_train

R Documentation

Email Train

Description

The training dataset includes a set of email subject lines used for classification of whether the message is spam (unsolicited commercial content) or not. Many subject lines include subject matter inappropriate for classroom use. Given the volume of headlines containing such language (especially for spam == TRUE), user discretion is advised. This dataset is a random sample of 80% of the emails data.

The testing dataset is a random sample of 20% of the emails data.

Usage

Emails_train

Emails_test

Format

A data frame with 5,526 rows and 3 variables:

ids: an integer vector
subjectline: a character vector
type: a character vector

A data frame with 1,382 rows and 3 variables:

Source

Originally retrieved from https://www.stat.berkeley.edu/~nolan/data/spam/SpamAssassinMessages.zip

See Also

Data Science in R, Nolan and Temple Lang (ISBN 978-1482234817), Ch. 3

Examples

nrow(Emails_train)
nrow(Emails_test)

mdsr-book/mdsr documentation built on Aug. 23, 2024, 3:52 a.m.

mdsr-book/mdsr index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com