marketing: Market Basket Analysis

Description Usage Format Details Source Examples

Description

The dataset is an extract from this survey. It consists of 14 demographic attributes. The dataset is a good mixture of categorical and continuos variables with a lot of missing data. This is characteristic for data mining applications.

Usage

1

Format

A data frame with 8993 observations on the following 14 variables.

Income

ANNUAL INCOME OF HOUSEHOLD (PERSONAL INCOME IF SINGLE) 1. Less than \$10,000 2. \$10,000 to \$14,999 3. \$15,000 to \$19,999 4. \$20,000 to \$24,999 5. \$25,000 to \$29,999 6. \$30,000 to \$39,999 7. \$40,000 to \$49,999 8. \$50,000 to \$74,999 9. \$75,000 or more

Sex

1. Male 2. Female

Marital

1. Married 2. Living together, not married 3. Divorced or separated 4. Widowed 5. Single, never married

Age

1. 14 thru 17 2. 18 thru 24 3. 25 thru 34 4. 35 thru 44 5. 45 thru 54 6. 55 thru 64 7. 65 and Over

Edu

1. Grade 8 or less 2. Grades 9 to 11 3. Graduated high school 4. 1 to 3 years of college 5. College graduate 6. Grad Study

Occupation

1. Professional/Managerial 2. Sales Worker 3. Factory Worker/Laborer/Driver 4. Clerical/Service Worker 5. Homemaker 6. Student, HS or College 7. Military 8. Retired 9. Unemployed

Lived

HOW LONG HAVE YOU LIVED IN THE SAN FRAN./OAKLAND/SAN JOSE AREA? 1. Less than one year 2. One to three years 3. Four to six years 4. Seven to ten years 5. More than ten years

Dual_Income

DUAL INCOMES (IF MARRIED) 1. Not Married 2. Yes 3. No

Household

PERSONS IN YOUR HOUSEHOLD 1. One 2. Two 3. Three 4. Four 5. Five 6. Six 7. Seven 8. Eight 9. Nine or more

Householdu18

PERSONS IN HOUSEHOLD UNDER 18 0. None 1. One 2. Two 3. Three 4. Four 5. Five 6. Six 7. Seven 8. Eight 9. Nine or more

Status

HOUSEHOLDER STATUS 1. Own 2. Rent 3. Live with Parents/Family

Home_Type

1. House 2. Condominium 3. Apartment 4. Mobile Home 5. Other

Ethnic

1. American Indian 2. Asian 3. Black 4. East Indian 5. Hispanic 6. Pacific Islander 7. White 8. Other

Language

WHAT LANGUAGE IS SPOKEN MOST OFTEN IN YOUR HOME? 1. English 2. Spanish 3. Other

Details

The goal is to predict the Anual Income of Household from the other 13 demographics attributes.

Number of instances: 8993.

These are obtained from the original dataset with 9409 instances, by removing those observations with the response (Annual Income) missing.

Source

Source: Impact Resources, Inc., Columbus, OH (1987). A total of N=9409 questionnaires containg 502 questions were filled out by shopping mall customers in the San Francisco Bay area.

Examples

1
2

Example output

'data.frame':	8993 obs. of  14 variables:
 $ Income      : int  9 9 9 1 1 8 1 6 2 4 ...
 $ Sex         : int  2 1 2 2 2 1 1 1 1 1 ...
 $ Marital     : int  1 1 1 5 5 1 5 3 1 1 ...
 $ Age         : int  5 5 3 1 1 6 2 3 6 7 ...
 $ Edu         : int  4 5 5 2 2 4 3 4 3 4 ...
 $ Occupation  : int  5 5 1 6 6 8 9 3 8 8 ...
 $ Lived       : int  5 5 5 5 3 5 4 5 5 4 ...
 $ Dual_Income : int  3 3 2 1 1 3 1 1 3 3 ...
 $ Household   : int  3 5 3 4 4 2 3 1 3 2 ...
 $ Householdu18: int  0 2 1 2 2 0 1 0 0 0 ...
 $ Status      : int  1 1 2 3 3 1 2 2 2 2 ...
 $ Home_Type   : int  1 1 3 1 1 1 3 3 3 3 ...
 $ Ethnic      : int  7 7 7 7 7 7 7 7 7 7 ...
 $ Language    : int  NA 1 1 1 1 1 1 1 1 1 ...
     Income           Sex           Marital           Age       
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:2.000  
 Median :5.000   Median :2.000   Median :3.000   Median :3.000  
 Mean   :4.895   Mean   :1.547   Mean   :3.031   Mean   :3.415  
 3rd Qu.:7.000   3rd Qu.:2.000   3rd Qu.:5.000   3rd Qu.:4.000  
 Max.   :9.000   Max.   :2.000   Max.   :5.000   Max.   :7.000  
                                 NA's   :160                    
      Edu          Occupation        Lived        Dual_Income   
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:3.000   1st Qu.:1.000   1st Qu.:4.000   1st Qu.:1.000  
 Median :4.000   Median :4.000   Median :5.000   Median :1.000  
 Mean   :3.835   Mean   :3.788   Mean   :4.198   Mean   :1.545  
 3rd Qu.:5.000   3rd Qu.:6.000   3rd Qu.:5.000   3rd Qu.:2.000  
 Max.   :6.000   Max.   :9.000   Max.   :5.000   Max.   :3.000  
 NA's   :86      NA's   :136     NA's   :913                    
   Household      Householdu18        Status        Home_Type    
 Min.   :1.000   Min.   :0.0000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:0.0000   1st Qu.:1.000   1st Qu.:1.000  
 Median :3.000   Median :0.0000   Median :2.000   Median :1.000  
 Mean   :2.852   Mean   :0.6669   Mean   :1.837   Mean   :1.856  
 3rd Qu.:4.000   3rd Qu.:1.0000   3rd Qu.:2.000   3rd Qu.:3.000  
 Max.   :9.000   Max.   :9.0000   Max.   :3.000   Max.   :5.000  
 NA's   :375                      NA's   :240     NA's   :357    
     Ethnic         Language    
 Min.   :1.000   Min.   :1.000  
 1st Qu.:5.000   1st Qu.:1.000  
 Median :7.000   Median :1.000  
 Mean   :5.956   Mean   :1.127  
 3rd Qu.:7.000   3rd Qu.:1.000  
 Max.   :8.000   Max.   :3.000  
 NA's   :68      NA's   :359    

ElemStatLearn documentation built on May 30, 2017, 3:36 a.m.