Exercise: Titanic

file <- "data/titanic.rds"
stopifnot(file.exists(file))
data <- readRDS(file)
head(data)

In this exercise we will use the 'titanic' data set. As the data set presented on the slides, this version describes the survival status of individual passengers on the Titanic, however, it provides additional information on the passengers given by 10 covriates but does not include information on the crew.

Simply download the file r xfun::embed_file(file, text = "titanic.rds") by clicking or download it from OpenOLAT or from the corresponding homepage http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.html which also provides more detailed information on the data set.

We can import/read this file using data <- readRDS(...). The file contains the following information:

The Tasks

We would like to find out how the survival status of a passenger on the Titanic depends on the provided additional attributes. By employing a tree model we are looking for a separation into homogeneous subgroups based on the additional information.

Our response in this case is the binary variable survived, as covariates we use the additional variables gender, age, class, embarked, fare, sibsp and parch.

Apply the CTree algorithm to build the tree models described in the following steps:

formula <- survived ~  gender + age + class + embarked + fare + sibsp + parch

library("partykit")
ct <- ctree(formula, data = data)
ct <- ctree(formula, data = data, control = ctree_control(alpha = 0.01, minbucket = 20, maxdepth = 5))

library("caret")
caret::confusionMatrix(data$survived, predict(ct, newdata = data))

newpassenger <- data.frame(gender = "female",
                           age = 30,
                           class = "2nd",
                           embarked = "S",
                           fare = 25,
                           sibsp = "1",
                           parch = "2")
predict(ct, newdata = newpassenger)

newpassenger2 <- newpassenger
newpassenger2$class <- "3rd"
predict(ct, newdata = newpassenger2)
plot(ct)
set.seed(4)
trainid <- sample(1:NROW(data), size = 1471, replace = FALSE)
train <- data[trainid,]
test <- data[-trainid,]
test <- na.omit(test)

ctrain <- ctree(formula, data = train)
predtest <- predict(ctrain, newdata = test)

library("caret")
caret::confusionMatrix(test$survived, predtest)


ctrain <- ctree(formula, data = train, control = ctree_control(alpha = 0.01))
plot(ctrain)
predtest <- predict(ctrain, newdata = test)
caret::confusionMatrix(test$survived, predtest)


Try the partykit package in your browser

Any scripts or data that you put into this service are public.

partykit documentation built on April 14, 2023, 5:09 p.m.