data_titanic: Titanic data

Description Usage Format Source Examples

Description

This dataset contains information on 1309 passengers of the RMS Titanic. The goal is to predict survival based on 11 characteristics such as the travel class, age and sex of the passengers.

The original data source is https://www.kaggle.com/c/titanic/data

The data is split up in a training data consisting of 891 observations and a test data of 418 observations. The response in the test set was obtained by combining information from other data files, and has been verified by submitting it as a ‘prediction’ to kaggle and getting perfect marks.

Usage

1
data("data_titanic")

Format

A data frame with 1309 observations on the following variables.

PassengerId

a unique identified for each passenger.

Pclass

travel class of the passenger.

Name

name of the passenger.

Sex

sex of the passenger.

Age

age of the passenger.

SibSp

number of siblings and spouses traveling with the passenger.

Parch

number of parents and children traveling with the passenger.

Ticket

Ticket number of the passenger.

Fare

fare paid for the ticket.

Cabin

cabin number of the passenger.

Embarked

Port of embarkation. Takes the values C (Cherbourg), Q (Queenstown) and S (Southampton).

y

factor indicating casualty or survivor.

dataType

vector taking the values “train” or “test” indicating whether the observation belongs to the training or the test data.

Source

https://www.kaggle.com/c/titanic/data

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data("data_titanic")
traindata <- data_titanic[which(data_titanic$dataType == "train"), -13]
testdata <- data_titanic[which(data_titanic$dataType == "test"), -13]
str(traindata)
table(traindata$y)

# The data are used in:
## Not run: 
vignette("Rpart_examples")

## End(Not run)

classmap documentation built on Jan. 10, 2022, 1:06 a.m.