file <- "data/titanic.rds" stopifnot(file.exists(file)) data <- readRDS(file) head(data)
In this exercise we will use the 'titanic' data set. As the data set presented on the slides, this version describes the survival status of individual passengers on the Titanic, however, it provides additional information on the passengers given by 10 covriates but does not include information on the crew.
Simply download the file r xfun::embed_file(file, text = "titanic.rds")
by clicking or download it from OpenOLAT or from the corresponding homepage
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.html
which also provides more detailed information on the data set.
We can import/read this file using data <- readRDS(...)
. The file
contains the following information:
name
: passenger's name (character).gender.
: 'male' or 'female' (factor) .age
: age in years (numeric).class
: passenger class or the type of service aboard for crew members (factor).embarked
: place of embarkment (factor; C = Cherbourg, Q = Queenstown, S = Southampthon).country
: home country (factor).ticketno
: ticket number (integer; NA for crew members).fare
: ticket price (numeric; NA for crew members, musicians and employees of the shipyard company).sibsp
: number if siblings/spouses aboard (ordered factor).parch
: number of parents/children aboard (ordered factor).survived
: Did the passenger survive? (factor).We would like to find out how the survival status of a passenger on the Titanic depends on the provided additional attributes. By employing a tree model we are looking for a separation into homogeneous subgroups based on the additional information.
Our response in this case is the binary variable survived
, as covariates we use the additional variables gender
, age
, class
, embarked
, fare
, sibsp
and parch
.
Apply the CTree algorithm to build the tree models described in the following steps:
"titanic.rds"
.formula <- survived ~ gender + age + class + embarked + fare + sibsp + parch library("partykit") ct <- ctree(formula, data = data) ct <- ctree(formula, data = data, control = ctree_control(alpha = 0.01, minbucket = 20, maxdepth = 5)) library("caret") caret::confusionMatrix(data$survived, predict(ct, newdata = data)) newpassenger <- data.frame(gender = "female", age = 30, class = "2nd", embarked = "S", fare = 25, sibsp = "1", parch = "2") predict(ct, newdata = newpassenger) newpassenger2 <- newpassenger newpassenger2$class <- "3rd" predict(ct, newdata = newpassenger2)
plot(ct)
set.seed(4) trainid <- sample(1:NROW(data), size = 1471, replace = FALSE) train <- data[trainid,] test <- data[-trainid,] test <- na.omit(test) ctrain <- ctree(formula, data = train) predtest <- predict(ctrain, newdata = test) library("caret") caret::confusionMatrix(test$survived, predtest) ctrain <- ctree(formula, data = train, control = ctree_control(alpha = 0.01)) plot(ctrain) predtest <- predict(ctrain, newdata = test) caret::confusionMatrix(test$survived, predtest)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.