View source: R/calc_one_v_rest_auc.R
calc_one_v_rest_auc | R Documentation |
Calculating area under Precision-Recall curve (PRC) and Receiver-Operator characteristic curve (ROC) for all one-vs-rest comparisons in the fitted model
calc_one_v_rest_auc(
fit = NULL,
Xnew = NULL,
Ynew = NULL,
normalize_rows = NULL,
measure = c("PRC", "ROC"),
fitted_prob = NULL,
include_baseline = TRUE,
...
)
fit |
fitted hidden genome classifier object. Experimental: can be NULL, in which case
|
Xnew , Ynew |
New predictor design matrix and corresponding cancer site labels. If provided,
the trained hidden genome model (supplied through |
normalize_rows |
vector of the same length as |
measure |
Type of curve to use. Options include "PRC" (Precision Recall Curve) and "ROC" (Receiver Operator characteristic Curve). Can be a vector. |
fitted_prob |
an n_tumor x n_cancer matrix of predicted classification probabilities of
(corresponding to the "true" class labels provided in |
include_baseline |
logical. Along with the computed observed value(s) of the measure(s) should the null baseline value(s) be returned. Here null baseline refers to the expected value of the corresponding measure associated with a "baseline" classifier that (uniform) randomly assigns class labels to the sample units. |
Under the hood, the function uses several functions from R package precrec
to compute the performance
metrics. The argument fitted_prob
, when supplied, should ideally
contain predictive probabilities for training set tumors evaluated under a
cross-validation framework. If not supplied, pre-validated
prediction probabilities extracted from mlogit models, and
overoptimistic prediction probabilities (obtained by simply using the fitted
model on the training data) for other models are used.
Returns a data.table with length(measure) + 1
columns
("Class" and measure(s)) (2 * length(measure) + 1
many columns if
include_baseline = TRUE
) and n_class + 1 many rows, where n_class
denotes the number of cancer types present in the fitted model; the
final row provides the Macro (average) metrics.
The function uses package precrec under the hood to compute the AUCs. Please install precrec before using calc_one_v_rest_auc.
data("impact")
top_v <- variant_screen_mi(
maf = impact,
variant_col = "Variant",
cancer_col = "CANCER_SITE",
sample_id_col = "patient_id",
mi_rank_thresh = 50,
return_prob_mi = FALSE
)
var_design <- extract_design(
maf = impact,
variant_col = "Variant",
sample_id_col = "patient_id",
variant_subset = top_v
)
canc_resp <- extract_cancer_response(
maf = impact,
cancer_col = "CANCER_SITE",
sample_id_col = "patient_id"
)
pid <- names(canc_resp)
# create five stratified random folds
# based on the response cancer categories
set.seed(42)
folds <- data.table::data.table(
resp = canc_resp
)[,
foldid := sample(rep(1:5, length.out = .N)),
by = resp
]$foldid
# 80%-20% stratified separation of training and
# test set tumors
idx_train <- pid[folds != 5]
idx_test <- pid[folds == 5]
# train a classifier on the training set
# using only variants (will have low accuracy
# -- no meta-feature information used
fit0 <- fit_mlogit(
X = var_design[idx_train, ],
Y = canc_resp[idx_train]
)
calc_one_v_rest_auc(fit0)
calc_one_v_rest_auc(fit0, measure = "PRC")
calc_one_v_rest_auc(fit0, measure = "ROC")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.