Performance Analyses

Calculation of performance metrics on test sets or by resampling, as discussed previously, is one method of assessing model performance. Others available include measures of predictor variable importance, calibration curves comparing observed and predicted response values, partial dependence plots, and receiver operating characteristic analysis.

Variable Importance

The importance of variables in a model fit is estimated with the varimp function and plotted with plot. Variable importance is a relative measure of the contributions of model predictors and has a default range of 0 to 100, where 0 denotes the least important variables and 100 the most. Classes of models can differ with respect to how variable importance is defined. In the case of a GBMModel, importance of each predictor is based on the sum of squared empirical improvements over all internal tree nodes created by splitting on that variable [@greenwell:2019:GBM].

## Predictor variable importance
(vi <- varimp(surv_fit))


Alternatively, importance is based on negative log-transformed p-values for statistical models, like CoxModel, that produce them. For other classes of models, variable importance is generally defined and calculated by their underlying source packages.

Calibration Curves

Agreement between model-predicted and observed values can be visualized with calibration curves. In the construction of these curves, cases are partitioned into equally spaced bins according to their (resampled) predicted responses. Mean observed responses are then calculated within each of the bins and plotted on the vertical axis against the bin midpoints on the horizontal axis.

## Binned calibration curves
cal <- calibration(res_probs, breaks = 10)
plot(cal, se = TRUE)

As an alternative to discrete bins, curves can be smoothed over the individual predicted values by setting breaks = NULL.

## Smoothed calibration curves
cal <- calibration(res_probs, breaks = NULL)

Calibration curves that are close to the 45$^\circ$ line indicate close agreement between observed and predicted responses and a model that is said to be well calibrated.

Confusion Matrices

Confusion matrices of cross-classified observed and predicted categorical responses are available with the confusion function. They can be constructed with predicted class membership or with predicted class probabilities. In the latter case, predicted class membership is derived from predicted probabilities according to a probability cutoff value for binary factors (default: cutoff = 0.5) and according to the class with highest probability for factors with more than two levels.

## Confusion matrices
(conf <- confusion(res_probs, cutoff = 0.5))

Confusion matrices are the data structure upon which many of the performance metrics described earlier for factor predictor variables are based. Metrics commonly reported for confusion matrices are generated by the summary function.

## Summary performance metrics

Summaries can also be obtained with performance function but for select metrics specified by the user.

## Confusion matrix-specific metrics
metricinfo(conf) %>% names

## User-specified metrics
performance(conf, metrics = c("Accuracy" = accuracy,
                              "Sensitivity" = sensitivity,
                              "Specificity" = specificity))

Partial Dependence Plots

Partial dependence plots display the marginal effects of predictors on a response variable. Dependence for a select set of one or more predictor variables $X_S$ is computed as $$ \bar{f}S(X_S) = \frac{1}{N}\sum{i=1}^N f(X_S, x_{iS'}), $$ where $f$ is a fitted prediction function and $x_{iS'}$ are values of the remaining predictors in a dataset of $N$ cases. The response scale displayed in dependence plots will depend on the response variable type: probability for predicted factors and survival probabilities, original scale for numerics, and survival time for predicted survival means. By default, dependence is computed for each select predictor individually over a grid of 10 approximately evenly spaced values and averaged over the dataset on which the prediction function was fit.

## Partial dependence plots
pd <- dependence(surv_fit, select = c(thickness, age))

Averaging may be performed over different datasets to estimate marginal effects in other populations of cases, over different numbers of predictor values, and over quantile spacing of the values.

pd <- dependence(surv_fit, data = surv_test, select = thickness, n = 20,
                 intervals = "quantile")

In addition, dependence may be computed for combinations of multiple predictors to examine interaction effects and for summary statistics other than the mean.

Performance Curves

Tradeoffs between correct and incorrect classifications of binary outcomes, across the range of possible cutoff probabilities, can be studied with performance curves.


Receiver operating characteristic (ROC) curves are one example in which true positive rates (sensitivity) are plotted against false positive rates (1 - specificity) [@fawcett:2006:IRA]. Higher ROC curves are indicative of better predictive performance.

## ROC curves
roc <- performance_curve(res_probs)
plot(roc, diagonal = TRUE)

ROC curves show the relation between the two rates being plotted but not their relationships with specific cutoff values. The latter may be helpful for the selection of a cutoff value to apply in practice. Accordingly, separate plots of each rate versus the range of possible cutoff values are available with the type = "cutoffs" option.

plot(roc, type = "cutoffs")

Area under resulting ROC curves can be computed as an overall measure of model predictive performance and interpreted as the probability that a randomly selected event case will have a higher predicted value than a randomly selected non-event case.


Precision Recall

In general, any two binary response metrics may be specified for the construction of a performance curve. Precision recall curves are another example [@davis:2006:RPR].

## Precision recall curves
pr <- performance_curve(res_probs, metrics = c(precision, recall))


Lift curves depict the rate at which observed binary responses are identifiable from (resampled) predicted response probabilities. In particular, they plot the true positive findings (sensitivity) against the positive test rates for all possible classification probability cutoffs. Accordingly, a lift curve can be interpreted as the rate at which positive responses are found as a function of the positive test rate among cases.

## Lift curves
lf <- lift(res_probs)
plot(lf, find = 0.75)

brian-j-smith/MachineShop documentation built on Nov. 12, 2019, 8:33 p.m.