vcr.rpart.train: Prepare for visualization of an rpart classification on...

View source: R/VCR_rpart.R

vcr.rpart.trainR Documentation

Prepare for visualization of an rpart classification on training data.

Description

Produces output for the purpose of constructing graphical displays such as the classmap. The user first needs to train a classification tree on the data by rpart::rpart. This then serves as an argument to vcr.rpart.train.

Usage

vcr.rpart.train(X, y, trainfit, type = list(),
                k = 5, stand = TRUE)

Arguments

X

A rectangular matrix or data frame, where the columns (variables) may be of mixed type and may contain NA's.

y

factor with the given class labels. It is crucial that X and y are exactly the same as in the call to rpart::rpart. y is allowed to contain NA's.

k

the number of nearest neighbors used in the farness computation.

trainfit

the output of an rpart::rpart training cycle.

type

list for specifying some (or all) of the types of the variables (columns) in X, used for computing the dissimilarity matrix, as in cluster::daisy. The list may contain the following components: "ordratio" (ratio scaled variables to be treated as ordinal variables), "logratio" (ratio scaled variables that must be logarithmically transformed), "asymm" (asymmetric binary) and "symm" (symmetric binary variables). Each component's value is a vector, containing the names or the numbers of the corresponding columns of X. Variables not mentioned in the type list are interpreted as usual (see argument X).

stand

whether or not to standardize numerical (interval scaled) variables by their range as in the original cluster::daisy code for the farness computation. Defaults to TRUE.

Value

A list with components:

X

The input data X. Keep??

yint

number of the given class of each case. Can contain NA's.

y

given class label of each case. Can contain NA's.

levels

levels of y

predint

predicted class number of each case. For each case this is the class with the highest posterior probability. Always exists.

pred

predicted label of each case.

altint

number of the alternative class. Among the classes different from the given class, it is the one with the highest posterior probability. Is NA for cases whose y is missing.

altlab

label of the alternative class. Is NA for cases whose y is missing.

PAC

probability of the alternative class. Is NA for cases whose y is missing.

figparams

parameters for computing fig, can be used for new data.

fig

distance of each case i from each class g. Always exists.

farness

farness of each case from its given class. Is NA for cases whose y is missing.

ofarness

for each case i, its lowest fig[i,g] to any class g. Always exists.

trainfit

the trainfit used to build the VCR object.

Author(s)

Raymaekers J., Rousseeuw P.J.

References

Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)

See Also

vcr.rpart.newdata, classmap, silplot, stackedplot

Examples

library(rpart)
data("data_titanic")
traindata <- data_titanic[which(data_titanic$dataType == "train"), -13]
str(traindata); table(traindata$y)
set.seed(123) # rpart is not deterministic
rpart.out <- rpart(y ~ Pclass + Sex + SibSp +
                    Parch + Fare + Embarked,
                  data = traindata, method = 'class', model = TRUE)
y_train <- traindata[, 12]
x_train <- traindata[, -12]
mytype <- list(nominal = c("Name", "Sex", "Ticket", "Cabin", "Embarked"), ordratio = c("Pclass"))
# These are 5 nominal columns, and one ordinal.
# The variables not listed are by default interval-scaled.
vcrtrain <- vcr.rpart.train(x_train, y_train, rpart.out, mytype)
confmat.vcr(vcrtrain)
silplot(vcrtrain, classCols = c(2, 4))
classmap(vcrtrain, "casualty", classCols = c(2, 4))
classmap(vcrtrain, "survived", classCols = c(2, 4))

# For more examples, we refer to the vignette:
## Not run: 
vignette("Rpart_examples")

## End(Not run)

classmap documentation built on April 23, 2023, 5:09 p.m.