qresplot: Draw a quasi residual plot of PAC versus a data feature

Draw a quasi residual plot of PAC versus a data feature


Draw a quasi residual plot to visualize classification results. The vertical axis of the quasi residual plot shows each case's probability of alternative class (PAC). The horizontal axis shows the feature given as the second argument in the function call.


qresplot(PAC, feat, xlab = NULL, xlim = NULL,
         main = NULL, identify = FALSE, gray = TRUE,
         opacity = 1, squareplot = FALSE, plotLoess = FALSE,
         plotErrorBars = FALSE, plotQuantiles = FALSE,
         grid = NULL, probs = c(0.5, 0.75),
         cols = NULL, fac = 1, cex = 1,
         cex.main = 1.2, cex.lab = 1,
         cex.axis = 1, pch = 19)



vector with the PAC values of a classification, typically the $PAC in the return of a call to a function vcr.*.*


the PAC will be plotted versus this data feature. Note that feat does not have to be one of the explanatory variables of the model. It can be another variable, a combination of variables (like a sum or a principal component score), the row number of the cases if they were recorded succesively, etc.


label for the horizontal axis, i.e. the name of variable feat.


limits for the horizontal axis. If NULL, the range of feat is used.


title for the plot.


if TRUE, left-click on a point to get its number, then ESC to exit.


logical, if TRUE (the default) the plot region where PAC < 0.5 gets a light gray background. Points in this region were classified into their given class, and the points above this region were misclassified.


determines opacity of plotted dots. Value between 0 and 1, where 0 is transparent and 1 is opaque.


if TRUE, the horizontal and vertical axis will get the same length.


if TRUE, a standard loess curve is fitted and superimposed on the plot. May not work well if feat is discrete with few values. At most one of the options plotLoess, plotErrorbars, or plotQuantiles can be selected.


if TRUE, the average PAC and its standard error are computed on the intervals of a grid (see option grid). Then a red curve connecting the averages is plotted, as well as two blue curves corresponding to the average plus or minus one standard error. At most one of the options plotLoess, plotErrorbars, or plotQuantiles can be selected.


if TRUE, one or more quantiles of the PAC are computed on the intervals of a grid (see option grid). The quantiles correspond the probabilities in option probs. Then the curves connecting the quantiles are plotted. At most one of the options plotLoess, plotErrorbars, or plotQuantiles can be selected.


only used when plotErrorBars or plotQuantiles are selected. This is a vector with increasing feat values, forming the grid. If NULL, the grid consists of the minimum and the maximum of feat, with 9 equispaced points between them.


only used when plotQuantiles is selected. This is a vector with probabilities determining the quantiles. If NULL, defaults to c(0.5, 0.75).


only used when plotquantiles is selected. A vector with the colors of the quantile curves. If NULL the cols are taken as 2, 3, ...


only used when plotLoess, plotErrorBars or plotQuantiles are selected. A real number to multiply the resulting curves. A value fac > 1 can be useful to better visualize the curves when they would be too close to zero. By default (fac = 1) this is not done.


passed on to plot.


same, for title.


same, for labels on horizontal and vertical axes.


same, for axes.


plot character for the points, defaults to 19.



a matrix with 2 columns containing the coordinates of the plotted points. This makes it easier to add text next to interesting points. If identify = TRUE, the attribute ids of coordinates contains the row numbers of the identified points in the matrix coordinates.


Raymaekers J., Rousseeuw P.J.


Raymaekers J., Rousseeuw P.J.(2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. (link to open access pdf)


traindata <- data_titanic[which(data_titanic$dataType == "train"), -13]
set.seed(123) # rpart is not deterministic
rpart.out <- rpart(y ~ Pclass + Sex + SibSp +
                    Parch + Fare + Embarked,
                  data = traindata, method = 'class', model = TRUE)
mytype <- list(nominal = c("Name", "Sex", "Ticket", "Cabin", "Embarked"), ordratio = c("Pclass"))
x_train <- traindata[, -12]
y_train <- traindata[,  12]
vcrtrain <- vcr.rpart.train(x_train, y_train, rpart.out, mytype)
# Quasi residual plot versus age, for males only:
PAC <- vcrtrain$PAC[which(x_train$Sex == "male")]
feat <- x_train$Age[which(x_train$Sex == "male")]
qresplot(PAC, feat, xlab = "Age (years)", opacity = 0.5,
         main = "quasi residual plot for male passengers",
         plotLoess = TRUE)
text(x = 14, y = 0.60, "loess curve", col = "red", cex = 1)

