spambase: Spambase Data Set

Description Usage Format Details Source Examples

Description

The Spambase data set was created by Mark Hopkins, Erik Reeber, George Forman, and Jaap Suermondt at Hewlett-Packard Labs. It includes 4601 observations corresponding to email messages, 1813 of which are spam. From the original email messages, 58 different attributes were computed.

Usage

1

Format

A data frame with 4601 observations on the following 58 variables.

word_freq_make

a numeric vector

word_freq_address

a numeric vector

word_freq_all

a numeric vector

word_freq_3d

a numeric vector

word_freq_our

a numeric vector

word_freq_over

a numeric vector

word_freq_remove

a numeric vector

word_freq_internet

a numeric vector

word_freq_order

a numeric vector

word_freq_mail

a numeric vector

word_freq_receive

a numeric vector

word_freq_will

a numeric vector

word_freq_people

a numeric vector

word_freq_report

a numeric vector

word_freq_addresses

a numeric vector

word_freq_free

a numeric vector

word_freq_business

a numeric vector

word_freq_email

a numeric vector

word_freq_you

a numeric vector

word_freq_credit

a numeric vector

word_freq_your

a numeric vector

word_freq_font

a numeric vector

word_freq_000

a numeric vector

word_freq_money

a numeric vector

word_freq_hp

a numeric vector

word_freq_hpl

a numeric vector

word_freq_george

a numeric vector

word_freq_650

a numeric vector

word_freq_lab

a numeric vector

word_freq_labs

a numeric vector

word_freq_telnet

a numeric vector

word_freq_857

a numeric vector

word_freq_data

a numeric vector

word_freq_415

a numeric vector

word_freq_85

a numeric vector

word_freq_technology

a numeric vector

word_freq_1999

a numeric vector

word_freq_parts

a numeric vector

word_freq_pm

a numeric vector

word_freq_direct

a numeric vector

word_freq_cs

a numeric vector

word_freq_meeting

a numeric vector

word_freq_original

a numeric vector

word_freq_project

a numeric vector

word_freq_re

a numeric vector

word_freq_edu

a numeric vector

word_freq_table

a numeric vector

word_freq_conference

a numeric vector

char_freq_semicolon

a numeric vector

char_freq_left_paren

a numeric vector

char_freq_left_bracket

a numeric vector

char_freq_exclamation

a numeric vector

char_freq_dollar

a numeric vector

char_freq_pound

a numeric vector

capital_run_length_average

a numeric vector

capital_run_length_longest

a numeric vector

capital_run_length_total

a numeric vector

is_spam

a factor with levels 0 1

Details

This data is used as an example in the book "R in a Nutshell," from O'Reilly Media.

Source

This data set is from the UCI Machine Learning Repository. You can find more information about this data set, including the ciation policy, from http://archive.ics.uci.edu/ml/datasets/Spambase

Examples

1
2
3
4
5
data(spambase)
table(spambase$is_spam)
# fit a linear disciminant analysis model to the data
library(MASS)
spam.lda <- qda(formula=is_spam~., data=spambase)

Example output

Loading required package: nutshell.bbdb
Loading required package: nutshell.audioscrobbler

   0    1 
2788 1813 

nutshell documentation built on May 1, 2019, 10:08 p.m.

Related to spambase in nutshell...