eval_feature_selection_curve_funs | R Documentation |
Evaluate the ROC or PR curves corresponding to the selected
features, given the true feature support and the estimated feature
importances. eval_feature_selection_curve()
evaluates the ROC or PR
curve for each experimental replicate separately.
summarize_feature_selection_curve()
summarizes the ROC or PR curve
across experimental replicates.
eval_feature_selection_curve(
fit_results,
vary_params = NULL,
nested_cols = NULL,
truth_col,
imp_col,
group_cols = NULL,
curve = c("ROC", "PR"),
na_rm = FALSE
)
summarize_feature_selection_curve(
fit_results,
vary_params = NULL,
nested_cols = NULL,
truth_col,
imp_col,
group_cols = NULL,
curve = c("ROC", "PR"),
na_rm = FALSE,
x_grid = seq(0, 1, by = 0.01),
summary_funs = c("mean", "median", "min", "max", "sd", "raw"),
custom_summary_funs = NULL,
eval_id = ifelse(curve == "PR", "precision", "TPR")
)
fit_results |
A tibble, as returned by |
vary_params |
A vector of |
nested_cols |
(Optional) A character string or vector specifying the
name of the column(s) in |
truth_col |
A character string identifying the column in
|
imp_col |
A character string identifying the column in
|
group_cols |
(Optional) A character string or vector specifying the column(s) to group rows by before evaluating metrics. This is useful for assessing within-group metrics. |
curve |
Either "ROC" or "PR" indicating whether to evaluate the ROC or Precision-Recall curve. |
na_rm |
A |
x_grid |
Vector of values between 0 and 1 at which to evaluate the ROC
or PR curve. If |
summary_funs |
Character vector specifying how to summarize evaluation metrics. Must choose from a built-in library of summary functions - elements of the vector must be one of "mean", "median", "min", "max", "sd", "raw". |
custom_summary_funs |
Named list of custom functions to summarize results. Names in the list should correspond to the name of the summary function. Values in the list should be a function that takes in one argument, that being the values of the evaluated metrics. |
eval_id |
Character string. ID to be used as a suffix when naming result
columns. Default |
The output of eval_feature_selection_curve()
is a tibble
with
the following columns:
Replicate ID.
Name of DGP.
Name of Method.
A list of tibbles with x and y coordinate values for
the ROC/PR curve for the given experimental replicate. If
curve = "ROC"
, the tibble
has the columns .threshold
,
FPR
, and TPR
for the threshold, false positive rate, and true
positive rate, respectively. If curve = "PR"
, the tibble
has
the columns .threshold
, recall
, and precision
.
as well as any columns specified by group_cols
and vary_params
.
The output of summarize_feature_selection_curve()
is a grouped
tibble
containing both identifying information and the
feature selection curve results aggregated over experimental replicates.
Specifically, the identifier columns include .dgp_name
,
.method_name
, and any columns specified by group_cols
and
vary_params
. In addition, there are results columns corresponding to
the requested statistics in summary_funs
and
custom_summary_funs
. If curve = "ROC"
, these results columns
include FPR
and others that end in the suffix "_TPR". If
curve = "PR"
, the results columns include recall
and others
that end in the suffix "_precision".
Other feature_selection_funs:
eval_feature_importance_funs
,
eval_feature_selection_err_funs
,
plot_feature_importance()
,
plot_feature_selection_curve()
,
plot_feature_selection_err()
# generate example fit_results data for a feature selection problem
fit_results <- tibble::tibble(
.rep = rep(1:2, times = 2),
.dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
.method_name = c("Method"),
feature_info = lapply(
1:4,
FUN = function(i) {
tibble::tibble(
# feature names
feature = c("featureA", "featureB", "featureC"),
# true feature support
true_support = c(TRUE, FALSE, TRUE),
# estimated feature importance scores
est_importance = c(10, runif(2, min = -2, max = 2))
)
}
)
)
# evaluate feature selection ROC/PR curves for each replicate
roc_results <- eval_feature_selection_curve(
fit_results,
curve = "ROC",
nested_cols = "feature_info",
truth_col = "true_support",
imp_col = "est_importance"
)
pr_results <- eval_feature_selection_curve(
fit_results,
curve = "PR",
nested_cols = "feature_info",
truth_col = "true_support",
imp_col = "est_importance"
)
# summarize feature selection ROC/PR curves across replicates
roc_summary <- summarize_feature_selection_curve(
fit_results,
curve = "ROC",
nested_cols = "feature_info",
truth_col = "true_support",
imp_col = "est_importance"
)
pr_summary <- summarize_feature_selection_curve(
fit_results,
curve = "PR",
nested_cols = "feature_info",
truth_col = "true_support",
imp_col = "est_importance"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.