Emails_train | R Documentation |
The training dataset includes a set of email subject lines used for classification
of whether the message is spam (unsolicited commercial content) or not.
Many subject lines include subject matter inappropriate for classroom use.
Given the volume of headlines containing such language
(especially for spam == TRUE
), user discretion is advised.
This dataset is a random sample of 80% of the emails data.
The testing dataset is a random sample of 20% of the emails data.
Emails_train
Emails_test
A data frame with 5,526 rows and 3 variables:
an integer vector
a character vector
a character vector
A data frame with 1,382 rows and 3 variables:
Originally retrieved from https://www.stat.berkeley.edu/~nolan/data/spam/SpamAssassinMessages.zip
Data Science in R, Nolan and Temple Lang (ISBN 978-1482234817), Ch. 3
nrow(Emails_train)
nrow(Emails_test)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.