mlboot: Calculate bootstrap confidence intervals for a performance...

Description Usage Arguments Value

View source: R/bootstrap.R

Description

Calculate the performance scores for one or two predictive models (and their difference) on a given testing set using an arbitrary performance metric and estimate bootstrap confidence intervals around these scores. Bootstrapping can be customized to be basic nonparametric or cluster nonparameter, etc.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
mlboot(
  .data,
  trusted,
  predicted,
  metric,
  cluster,
  pairwise = TRUE,
  n_boot = 2000,
  interval = 0.95,
  null = 0,
  ...
)

Arguments

.data

Required. A dataframe containing trusted labels and predicted labels where each row is a single object/observation and each column is a variable describing that object/observation.

trusted

Required. The name of a single variable in .data that contains trusted labels.

predicted

Required. A vector of names of one or more variables in .data that contains predicted labels.

metric

Required. A function that takes in at least two arguments (for trusted labels and predicted labels, plus any additional customization arguments) and returns a single number indicating performance. A number of scoring/metric functions are built into the package and custom functions can be developed as well.

cluster

Optional. The name of a single variable in .data that contains the cluster membership of each object/observation.

pairwise

Optional. A logical indicating whether to estimate the difference between all pairs of predicted labels (default = TRUE).

n_boot

Optional. A positive integer indicating how many bootstrap resamples the confidence intervals should be estimated from (default = 2000).

interval

Optional. A number between 0 and 1 indicating the confidence level of the confidence intervals to be estimated, such that 0.95 yields 95% confidence intervals (default = 0.95).

null

Optional. A single number to compare the bootstrap estimate to when calculating p-values (default = 0).

...

Optional. Additional arguments to pass along to the metric function.

Value

A list containing the results and a description of the analysis.

type

A string indicating whether a single predictive model was examined or two models were compared

metric

A string indicating the name of the performance metric function used

ntotal

An integer indicating the total number of examples in the test set

ncluster

An integer indicating the number of clusters present in the test set

nboot

An integer indicating the number of bootstrap resamples used to estimate confidence intervals

interval

The confidence level of the confidence intervals

score_obs

A vector containing the observed performance score for the first model and, if applicable, the second model and their difference

score_cil

A vector containing the lower bounds of the confidence intervals corresponding to the observed performance scores

score_ciu

A vector containing the upper bounds of the confidence intervals corresponding to the observed performance scores

score_pval

A vector containing p-values for the performance scores

resamples

A matrix containing the performance scores and, if applicable, their difference in each bootstrap resample


jmgirard/mlboot documentation built on Sept. 12, 2021, 12:59 p.m.