Model Performance Metrics

Performance metrics quantify associations between observed and predicted responses and provide a means of evaluating the predictive performances of models.

Performance Function

Metrics can be computed with the r rdoc_url("performance()") function applied to observed responses and responses predicted with the r rdoc_url("predict()") function. In the case of observed versus predicted survival probabilities or events, metrics will be calculated at user-specified survival times and returned along with their time-integrated mean.

## Survival performance metrics

## Observed responses
obs <- response(surv_fit, newdata = surv_test)

## Predicted survival means
pred_means <- predict(surv_fit, newdata = surv_test)
performance(obs, pred_means)

## Predicted survival probabilities
pred_probs <- predict(surv_fit, newdata = surv_test, times = surv_times, type = "prob")
performance(obs, pred_probs)

## Predicted survival events
pred_events <- predict(surv_fit, newdata = surv_test, times = surv_times)
performance(obs, pred_events)

Function r rdoc_url("performance()") computes a default set of metrics according to the observed and predicted response types, as indicated and in the order given in the table below.

Table 3. Default performance metrics by response types.

Response | Default Metrics :----------------------- | :------------------------------------------------------ Factor | Brier Score, Accuracy, Cohen's Kappa Binary Factor | Brier Score, Accuracy, Cohen's Kappa, Area Under ROC Curve, Sensitivity, Specificity Numeric Vector or Matrix | Root Mean Squared Error, R^2^, Mean Absolute Error Survival Means | Concordance Index Survival Probabilities | Brier Score, Area Under ROC Curve, Accuracy Survival Events | Accuracy

These defaults may be changed by specifying one or more package-supplied metric functions to the metrics argument of r rdoc_url("performance()"). Specification of the metrics argument can be in terms of a single metric function, function name, or list of metric functions. List names, if specified, will be displayed as metric labels in graphical and tabular summaries; otherwise, the function names will be used as labels for unnamed lists.

## Single metric function
performance(obs, pred_means, metrics = cindex)

## Single metric function name
performance(obs, pred_means, metrics = "cindex")

## List of metric functions
performance(obs, pred_means, metrics = c(cindex, rmse, rmsle))

## Named list of metric functions
performance(obs, pred_means,
            metrics = c("CIndex" = cindex, "RMSE" = rmse, "RMSLE" = rmsle))

Metrics based on classification of two-level class probabilities, like sensitivity and specificity, optionally allow for specification of the classification cutoff probability (default: cutoff = 0.5).

## User-specified survival probability metrics
performance(obs, pred_probs, metrics = c(sensitivity, specificity), cutoff = 0.7)

Metric Functions

Whereas multiple package-supplied metrics can be calculated simultaneously with the r rdoc_url("performance()") function, each exists as a stand-alone function that can be called individually.

## Metric functions for survival means
cindex(obs, pred_means)

rmse(obs, pred_means)

rmsle(obs, pred_means)

## Metric functions for survival probabilities
sensitivity(obs, pred_probs)

specificity(obs, pred_probs)

Metric Information

A named list of available metrics can be obtained interactively with the r rdoc_url("metricinfo()") function, and includes the following components for each one.

label : character descriptor for the metric.

maximize : logical indicating whether maximization of the metric leads to better predictive performance.

arguments : closure with the argument names and corresponding default values of the metric function.

response_types : data frame of the observed and predicted response variable types supported by the metric.

Function r rdoc_url("metricinfo()") may be called without arguments, with one or more metric functions, an observed response variable, an observed and predicted response variable pair, response variable types, or resampled output; and will return information on all matching metrics.

## All available metrics
metricinfo() %>% names

Information is displayed below for the r rdoc_url("cindex()", "metrics") function corresponding to a concordance index, which is applicable to observed survival and predicted means.

## Metric-specific information
metricinfo(cindex)

Submitting the metric function at the console will result in similar information being displayed as formatted text.

cindex

Type-Specific Metrics

When data objects are supplied as arguments to r rdoc_url("metricinfo()"), information is returned on all metrics applicable to response variables of the same data types. Observed response variable type is inferred from the first data argument and predicted type from the second, if given. For survival responses, predicted types may be numeric for survival means, r rdoc_url("SurvEvents", "SurvMatrix") for 0-1 survival events at specified follow-up times, or r rdoc_url("SurvProbs", "SurvMatrix") for follow-up time survival probabilities. If model functions are additionally supplied as arguments, information on the subset matching the data types is returned.

## Metrics for observed and predicted response variable types
metricinfo(Surv(0)) %>% names

metricinfo(Surv(0), numeric(0)) %>% names

metricinfo(Surv(0), SurvEvents(0)) %>% names

metricinfo(Surv(0), SurvProbs(0)) %>% names

## Identify survival-specific metrics
metricinfo(Surv(0), auc, cross_entropy, cindex) %>% names

Response Variable-Specific Metrics

Existing response variables observed and those obtained from the r rdoc_url("predict()") function may be given as arguments to identify metrics that are applicable to them.

## Metrics for observed and predicted responses from model fits
metricinfo(obs, pred_means) %>% names

metricinfo(obs, pred_probs) %>% names

Factors

Metrics applicable to multi-level factor response variables are summarized below.

r rdoc_url("accuracy()", "metrics") : proportion of correctly classified responses.

r rdoc_url("brier()", "metrics") : Brier score.

r rdoc_url("cross_entropy()", "metrics") : cross entropy loss averaged over the number of cases.

r rdoc_url("kappa2()", "metrics") : Cohen's kappa statistic measuring relative agreement between observed and predicted classifications.

r rdoc_url("weighted_kappa2()", "metrics") : weighted Cohen's kappa for ordered factor responses only.

Brier score and cross entropy loss are computed directly on predicted class probabilities. The other metrics are computed on predicted class membership, defined as the factor level with the highest predicted probability.

Binary Factors

Metrics for binary factors include those given for multi-level factors as well as the following.

r rdoc_url("auc()", "metrics") : area under a performance curve.

r rdoc_url("cindex()", "metrics") : concordance index computed as rank order agreement between predicted probabilities for paired event and non-event cases. This metric can be interpreted as the probability that a randomly selected event case will have a higher predicted value than a randomly selected non-event case, and is the same as area under the ROC curve.

r rdoc_url("f_score()", "metrics") : F score, $F_\beta = (1 + \beta^2) \frac{\text{precision} \times \text{recall}}{\beta^2 \times \text{precision} + \text{recall}}$. F1 score $(\beta = 1)$ is the package default.

r rdoc_url("fnr()", "metrics") : false negative rate, $FNR = \frac{FN}{TP + FN} = 1 - TPR$.

conf <- matrix(c("True Negative (TN)", "False Positive (FP)",
                 "False Negative (FN)", "True Positive (TP)"),
               2, 2,
               dimnames = list("Predicted Response" = c("Negative", "Positive"),
                               "Observed Response" = c("Negative", "Positive")))
kable(conf,
      caption = "Table 4. Confusion matrix of observed and predicted response classifications.",
      align = c("c", "c")) %>%
  kable_styling(full_width = FALSE, position = "center") %>%
  add_header_above(c("Predicted Response" = 1, "Observed Response" = 2))

r rdoc_url("fpr()", "metrics") : false positive rate, $FPR = \frac{FP}{TN + FP} = 1 - TNR$.

r rdoc_url("npv()", "metrics") : negative predictive value, $NPV = \frac{TN}{TN + FN}$.

r rdoc_url("ppr()", "metrics") : positive prediction rate, $PPR = \frac{TP + FP}{TP + FP + TN + FN}$.

r rdoc_url("ppv()", "metrics"), r rdoc_url("precision()", "metrics") : positive predictive value, $PPV = \frac{TP}{TP + FP}$.

r rdoc_url("pr_auc()", "metrics"), r rdoc_url("auc()", "metrics") : area under a precision recall curve.

r rdoc_url("roc_auc()", "metrics"), r rdoc_url("auc()", "metrics") : area under an ROC curve.

r rdoc_url("roc_index()", "metrics") : a tradeoff function of sensitivity and specificity as defined by the fun argument in this function (default: (sensitivity + specificity) / 2). The function allows for specification of tradeoffs [@perkins:2006:IOC] other than the default of Youden's J statistic [@youden:1950:IRD].

r rdoc_url("sensitivity()", "metrics"), r rdoc_url("recall()", "metrics"), r rdoc_url("tpr()", "metrics") : true positive rate, $TPR =\frac{TP}{TP + FN} = 1 - FNR$.

r rdoc_url("specificity()", "metrics"), r rdoc_url("tnr()", "metrics") : true negative rate, $TNR = \frac{TN}{TN + FP} = 1 - FPR$.

Area under the ROC and precision-recall curves as well as the concordance index are computed directly on predicted class probabilities. The other metrics are computed on predicted class membership. Memberships are defined to be in the second factor level if predicted probabilities are greater than the function default or user-specified cutoff value.

Numerics

Performance metrics are defined below for numeric vector responses. If applied to a numeric matrix response, the metrics are computed separately for each column and then averaged to produce a single value.

r rdoc_url("gini()", "metrics") : Gini coefficient.

r rdoc_url("mae()", "metrics") : mean absolute error, $MAE = \frac{1}{N}\sum_{i=1}^N|y_i - \hat{y}_i|$, where $y_i$ and $\hat{y}_i$ are the $N$ observed and predicted responses.

r rdoc_url("mse()", "metrics") : mean squared error, $MSE = \frac{1}{N}\sum_{i=1}^N(y_i - \hat{y}_i)^2$.

r rdoc_url("msle()", "metrics") : mean squared log error, $MSLE = \frac{1}{N}\sum_{i=1}^N(log(1 + y_i) - log(1 + \hat{y}_i))^2$.

r rdoc_url("r2()", "metrics") : one minus residual divided by total sums of squares, $R^2 = 1 - \sum_{i=1}^N(y_i - \hat{y}i)^2 / \sum{i=1}^N(y_i - \bar{y})^2$.

r rdoc_url("rmse()", "metrics") : square root of mean squared error.

r rdoc_url("rmsle()", "metrics") : square root of mean squared log error.

Survival Objects

All previously described metrics for binary factor responses---plus accuracy, Brier score and Cohen's kappa---are applicable to survival probabilities predicted at specified follow-up times. Metrics are evaluated separately at each follow-up time and reported along with a time-integrated mean. The survival concordance index is computed with the method of Harrell [-@harrell:1982:EYM] and Brier score according to Graf et al. [-@graf:1999:ACP]; whereas, the others are computed according to the confusion matrix probabilities below, in which term $\hat{S}(t)$ is the predicted survival probability at follow-up time $t$ and $T$ is the survival time [@heagerty:2004:TDR].

conf <- matrix(c("$TN = \\Pr(\\hat{S}(t) \\ge \\text{cutoff} \\cap T >    t)$",
                 "$FP = \\Pr(\\hat{S}(t) <    \\text{cutoff} \\cap T >    t)$",
                 "$FN = \\Pr(\\hat{S}(t) \\ge \\text{cutoff} \\cap T \\le t)$",
                 "$TP = \\Pr(\\hat{S}(t) <    \\text{cutoff} \\cap T \\le t)$"),
               2, 2,
               dimnames = list("Predicted Response" = c("Non-Event", "Event"),
                               "Observed Response" = c("Non-Event", "Event")))
kable(conf,
      caption = "Table 5. Confusion matrix of observed and predicted survival response classifications.",
      align = c("c", "c"),
      escape = FALSE) %>%
  kable_styling(full_width = FALSE, position = "center") %>%
  add_header_above(c("Predicted Response" = 1, "Observed Response" = 2))

In addition, all of the metrics described for numeric vector responses are applicable to predicted survival means and are computed using only those cases with observed (non-censored) events.



brian-j-smith/MachineShop documentation built on Sept. 22, 2023, 10:01 p.m.