Performance metrics quantify associations between observed and predicted responses and provide a means of evaluating the predictive performances of models.
Metrics can be computed with the r rdoc_url("performance()")
function applied to observed responses and responses predicted with the r rdoc_url("predict()")
function. In the case of observed versus predicted survival probabilities or events, metrics will be calculated at user-specified survival times and returned along with their time-integrated mean.
## Survival performance metrics ## Observed responses obs <- response(surv_fit, newdata = surv_test) ## Predicted survival means pred_means <- predict(surv_fit, newdata = surv_test) performance(obs, pred_means) ## Predicted survival probabilities pred_probs <- predict(surv_fit, newdata = surv_test, times = surv_times, type = "prob") performance(obs, pred_probs) ## Predicted survival events pred_events <- predict(surv_fit, newdata = surv_test, times = surv_times) performance(obs, pred_events)
Function r rdoc_url("performance()")
computes a default set of metrics according to the observed and predicted response types, as indicated and in the order given in the table below.
Table 3. Default performance metrics by response types.
Response | Default Metrics :----------------------- | :------------------------------------------------------ Factor | Brier Score, Accuracy, Cohen's Kappa Binary Factor | Brier Score, Accuracy, Cohen's Kappa, Area Under ROC Curve, Sensitivity, Specificity Numeric Vector or Matrix | Root Mean Squared Error, R^2^, Mean Absolute Error Survival Means | Concordance Index Survival Probabilities | Brier Score, Area Under ROC Curve, Accuracy Survival Events | Accuracy
These defaults may be changed by specifying one or more package-supplied metric functions to the metrics
argument of r rdoc_url("performance()")
. Specification of the metrics
argument can be in terms of a single metric function, function name, or list of metric functions. List names, if specified, will be displayed as metric labels in graphical and tabular summaries; otherwise, the function names will be used as labels for unnamed lists.
## Single metric function performance(obs, pred_means, metrics = cindex) ## Single metric function name performance(obs, pred_means, metrics = "cindex") ## List of metric functions performance(obs, pred_means, metrics = c(cindex, rmse, rmsle)) ## Named list of metric functions performance(obs, pred_means, metrics = c("CIndex" = cindex, "RMSE" = rmse, "RMSLE" = rmsle))
Metrics based on classification of two-level class probabilities, like sensitivity and specificity, optionally allow for specification of the classification cutoff probability (default: cutoff = 0.5
).
## User-specified survival probability metrics performance(obs, pred_probs, metrics = c(sensitivity, specificity), cutoff = 0.7)
Whereas multiple package-supplied metrics can be calculated simultaneously with the r rdoc_url("performance()")
function, each exists as a stand-alone function that can be called individually.
## Metric functions for survival means cindex(obs, pred_means) rmse(obs, pred_means) rmsle(obs, pred_means) ## Metric functions for survival probabilities sensitivity(obs, pred_probs) specificity(obs, pred_probs)
A named list of available metrics can be obtained interactively with the r rdoc_url("metricinfo()")
function, and includes the following components for each one.
label
: character descriptor for the metric.
maximize
: logical indicating whether maximization of the metric leads to better predictive performance.
arguments
: closure with the argument names and corresponding default values of the metric function.
response_types
: data frame of the observed and predicted response variable types supported by the metric.
Function r rdoc_url("metricinfo()")
may be called without arguments, with one or more metric functions, an observed response variable, an observed and predicted response variable pair, response variable types, or resampled output; and will return information on all matching metrics.
## All available metrics metricinfo() %>% names
Information is displayed below for the r rdoc_url("cindex()", "metrics")
function corresponding to a concordance index, which is applicable to observed survival and predicted means.
## Metric-specific information metricinfo(cindex)
Submitting the metric function at the console will result in similar information being displayed as formatted text.
cindex
When data objects are supplied as arguments to r rdoc_url("metricinfo()")
, information is returned on all metrics applicable to response variables of the same data types. Observed response variable type is inferred from the first data argument and predicted type from the second, if given. For survival responses, predicted types may be numeric
for survival means, r rdoc_url("SurvEvents", "SurvMatrix")
for 0-1 survival events at specified follow-up times, or r rdoc_url("SurvProbs", "SurvMatrix")
for follow-up time survival probabilities. If model functions are additionally supplied as arguments, information on the subset matching the data types is returned.
## Metrics for observed and predicted response variable types metricinfo(Surv(0)) %>% names metricinfo(Surv(0), numeric(0)) %>% names metricinfo(Surv(0), SurvEvents(0)) %>% names metricinfo(Surv(0), SurvProbs(0)) %>% names ## Identify survival-specific metrics metricinfo(Surv(0), auc, cross_entropy, cindex) %>% names
Existing response variables observed and those obtained from the r rdoc_url("predict()")
function may be given as arguments to identify metrics that are applicable to them.
## Metrics for observed and predicted responses from model fits metricinfo(obs, pred_means) %>% names metricinfo(obs, pred_probs) %>% names
Metrics applicable to multi-level factor response variables are summarized below.
r rdoc_url("accuracy()", "metrics")
: proportion of correctly classified responses.
r rdoc_url("brier()", "metrics")
: Brier score.
r rdoc_url("cross_entropy()", "metrics")
: cross entropy loss averaged over the number of cases.
r rdoc_url("kappa2()", "metrics")
: Cohen's kappa statistic measuring relative agreement between observed and predicted classifications.
r rdoc_url("weighted_kappa2()", "metrics")
: weighted Cohen's kappa for ordered factor responses only.
Brier score and cross entropy loss are computed directly on predicted class probabilities. The other metrics are computed on predicted class membership, defined as the factor level with the highest predicted probability.
Metrics for binary factors include those given for multi-level factors as well as the following.
r rdoc_url("auc()", "metrics")
: area under a performance curve.
r rdoc_url("cindex()", "metrics")
: concordance index computed as rank order agreement between predicted probabilities for paired event and non-event cases. This metric can be interpreted as the probability that a randomly selected event case will have a higher predicted value than a randomly selected non-event case, and is the same as area under the ROC curve.
r rdoc_url("f_score()", "metrics")
: F score, $F_\beta = (1 + \beta^2) \frac{\text{precision} \times \text{recall}}{\beta^2 \times \text{precision} + \text{recall}}$. F1 score $(\beta = 1)$ is the package default.
r rdoc_url("fnr()", "metrics")
: false negative rate, $FNR = \frac{FN}{TP + FN} = 1 - TPR$.
conf <- matrix(c("True Negative (TN)", "False Positive (FP)", "False Negative (FN)", "True Positive (TP)"), 2, 2, dimnames = list("Predicted Response" = c("Negative", "Positive"), "Observed Response" = c("Negative", "Positive"))) kable(conf, caption = "Table 4. Confusion matrix of observed and predicted response classifications.", align = c("c", "c")) %>% kable_styling(full_width = FALSE, position = "center") %>% add_header_above(c("Predicted Response" = 1, "Observed Response" = 2))
r rdoc_url("fpr()", "metrics")
: false positive rate, $FPR = \frac{FP}{TN + FP} = 1 - TNR$.
r rdoc_url("npv()", "metrics")
: negative predictive value, $NPV = \frac{TN}{TN + FN}$.
r rdoc_url("ppr()", "metrics")
: positive prediction rate, $PPR = \frac{TP + FP}{TP + FP + TN + FN}$.
r rdoc_url("ppv()", "metrics")
, r rdoc_url("precision()", "metrics")
: positive predictive value, $PPV = \frac{TP}{TP + FP}$.
r rdoc_url("pr_auc()", "metrics")
, r rdoc_url("auc()", "metrics")
: area under a precision recall curve.
r rdoc_url("roc_auc()", "metrics")
, r rdoc_url("auc()", "metrics")
: area under an ROC curve.
r rdoc_url("roc_index()", "metrics")
: a tradeoff function of sensitivity and specificity as defined by the fun
argument in this function (default: (sensitivity + specificity) / 2). The function allows for specification of tradeoffs [@perkins:2006:IOC] other than the default of Youden's J statistic [@youden:1950:IRD].
r rdoc_url("sensitivity()", "metrics")
, r rdoc_url("recall()", "metrics")
, r rdoc_url("tpr()", "metrics")
: true positive rate, $TPR =\frac{TP}{TP + FN} = 1 - FNR$.
r rdoc_url("specificity()", "metrics")
, r rdoc_url("tnr()", "metrics")
: true negative rate, $TNR = \frac{TN}{TN + FP} = 1 - FPR$.
Area under the ROC and precision-recall curves as well as the concordance index are computed directly on predicted class probabilities. The other metrics are computed on predicted class membership. Memberships are defined to be in the second factor level if predicted probabilities are greater than the function default or user-specified cutoff value.
Performance metrics are defined below for numeric vector responses. If applied to a numeric matrix response, the metrics are computed separately for each column and then averaged to produce a single value.
r rdoc_url("gini()", "metrics")
: Gini coefficient.
r rdoc_url("mae()", "metrics")
: mean absolute error, $MAE = \frac{1}{N}\sum_{i=1}^N|y_i - \hat{y}_i|$, where $y_i$ and $\hat{y}_i$ are the $N$ observed and predicted responses.
r rdoc_url("mse()", "metrics")
: mean squared error, $MSE = \frac{1}{N}\sum_{i=1}^N(y_i - \hat{y}_i)^2$.
r rdoc_url("msle()", "metrics")
: mean squared log error, $MSLE = \frac{1}{N}\sum_{i=1}^N(log(1 + y_i) - log(1 + \hat{y}_i))^2$.
r rdoc_url("r2()", "metrics")
: one minus residual divided by total sums of squares, $R^2 = 1 - \sum_{i=1}^N(y_i - \hat{y}i)^2 / \sum{i=1}^N(y_i - \bar{y})^2$.
r rdoc_url("rmse()", "metrics")
: square root of mean squared error.
r rdoc_url("rmsle()", "metrics")
: square root of mean squared log error.
All previously described metrics for binary factor responses---plus accuracy, Brier score and Cohen's kappa---are applicable to survival probabilities predicted at specified follow-up times. Metrics are evaluated separately at each follow-up time and reported along with a time-integrated mean. The survival concordance index is computed with the method of Harrell [-@harrell:1982:EYM] and Brier score according to Graf et al. [-@graf:1999:ACP]; whereas, the others are computed according to the confusion matrix probabilities below, in which term $\hat{S}(t)$ is the predicted survival probability at follow-up time $t$ and $T$ is the survival time [@heagerty:2004:TDR].
conf <- matrix(c("$TN = \\Pr(\\hat{S}(t) \\ge \\text{cutoff} \\cap T > t)$", "$FP = \\Pr(\\hat{S}(t) < \\text{cutoff} \\cap T > t)$", "$FN = \\Pr(\\hat{S}(t) \\ge \\text{cutoff} \\cap T \\le t)$", "$TP = \\Pr(\\hat{S}(t) < \\text{cutoff} \\cap T \\le t)$"), 2, 2, dimnames = list("Predicted Response" = c("Non-Event", "Event"), "Observed Response" = c("Non-Event", "Event"))) kable(conf, caption = "Table 5. Confusion matrix of observed and predicted survival response classifications.", align = c("c", "c"), escape = FALSE) %>% kable_styling(full_width = FALSE, position = "center") %>% add_header_above(c("Predicted Response" = 1, "Observed Response" = 2))
In addition, all of the metrics described for numeric vector responses are applicable to predicted survival means and are computed using only those cases with observed (non-censored) events.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.