ptitanic: Titanic data with passenger names and other details removed.

ptitanicR Documentation

Titanic data with passenger names and other details removed.

Description

Titanic data with passenger names and other details removed.

Format

A data frame with 1046 observations on 6 variables.

pclass passenger class, unordered factor: 1st 2nd 3rd
survived factor: died or survived
sex unordered factor: male female
age age in years, min 0.167 max 80.0
sibsp number of siblings or spouses aboard, integer: 0...8
parch number of parents or children aboard, integer: 0...6

Source

The dataset was compiled by Frank Harrell and Robert Dawson:
https://hbiostat.org/data/repo/titanic.html.

See also:
https://biostat.app.vumc.org/wiki/pub/Main/DataSets/titanic3info.txt.

For this version of the Titanic data, passenger details were deleted, survived was cast as a factor, and the name changed to ptitanic to minimize confusion with other versions.

In this data the crew are conspicuous by their absence.

Contents of ptitanic:

         pclass survived    sex    age sibsp parch
    1       1st survived female 29.000     0     0
    2       1st survived   male  0.917     1     2
    3       1st     died female  2.000     1     2
    4       1st     died   male 30.000     1     2
    5       1st     died female 25.000     1     2
    ...
    1309    3rd     died   male 29.000     0     0
    

How ptitanic was built:

    load("titanic3.sav") # from Dr. Harrell's web site
    # discard name, ticket, fare, cabin, embarked, body, home.dest
    ptitanic <- titanic3[,c(1,2,4,5,6,7)]
    # change survived from integer to factor
    ptitanic$survived <- factor(ptitanic$survived, labels = c("died", "survived"))
    save(ptitanic, file = "ptitanic.rda")

This version of the data differs from etitanic in the earth package in that here survived is a factor (not an integer) and age has some NAs.

Examples

data(ptitanic)
summary(ptitanic)

# survival rate was greater for females
rpart.rules(rpart(survived ~ sex, data = ptitanic))

# survival rate was greater for higher classes
rpart.rules(rpart(survived ~ pclass, data = ptitanic))

# survival rate was greater for children
rpart.rules(rpart(survived ~ age, data = ptitanic))

# main indicator of missing data is 3rd class esp. with many children
obs.with.nas <- rowSums(is.na(ptitanic)) > 0
rpart.rules(rpart(obs.with.nas ~ ., data = ptitanic, method = "class"))


rpart.plot documentation built on May 29, 2024, 12:07 p.m.