performance: Assess Performance of a Classifier

View source: R/performance.R

performanceR Documentation

Assess Performance of a Classifier

Description

Assess the performance in term of AUC and brier score of one or several binary classifiers. Currently limited to logistic regressions and random forest.

Usage

performance(
  object,
  data = NULL,
  newdata = NA,
  individual.fit = FALSE,
  impute = "none",
  name.response = NULL,
  fold.size = 1/10,
  fold.repetition = 0,
  fold.balance = FALSE,
  null = c(brier = NA, AUC = 0.5),
  conf.level = 0.95,
  se = TRUE,
  transformation = TRUE,
  auc.type = "classical",
  simplify = TRUE,
  trace = TRUE,
  seed = NULL
)

Arguments

object

a glm or range object, or a list of such object.

data

[data.frame] the training data.

newdata

[data.frame] an external data used to assess the performance.

individual.fit

[logical] if TRUE the predictive model is refit for each individual using only the predictors with non missing values.

impute

[character] in presence of missing value in the regressors of the training dataset, should a complete case analysis be performed ("none") or should the median/mean ("median"/"mean") value be imputed. For categorical variables, the most frequent value is imputed.

name.response

[character] the name of the response variable (i.e. the one containing the categories).

fold.size

[double, >0] either the size of the test dataset (when >1) or the fraction of the dataset (when <1) to be used for testing when using cross-validation.

fold.repetition

[integer] when strictly positive, the number of folds used in the cross-validation. If 0 then no cross validation is performed.

fold.balance

[logical] should the outcome distribution in the folds of the cross-validation be similar to the one of the original dataset?

null

[numeric vector of length 2] the right-hand side of the null hypothesis relative to each metric.

conf.level

[numeric] confidence level for the confidence intervals.

se

[logical] should the uncertainty about AUC/brier be computed? When TRUE adapt the method of LeDell et al. (2015) to repeated cross-validation for the AUC and the brier score.

transformation

[logical] should the CI be computed on the logit scale / log scale for the net benefit / win ratio and backtransformed. Otherwise they are computed without any transformation.

auc.type

[character] should the auc be computed approximating the predicted probability by a dirac ("classical", usual AUC formula) or approximating the predicted probability by a normal distribution.

simplify

[logical] should the number of fold and the size of the fold used for the cross validation be removed from the output?

trace

[logical] Should the execution of the function be traced.

seed

[integer, >0] seed used to ensure reproducibility.

References

LeDell E, Petersen M, van der Laan M. Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates. Electron J Stat. 2015;9(1):1583-1607. doi:10.1214/15-EJS1035

Examples

## Simulate data
set.seed(10)
n <- 100
df.train <- data.frame(Y = rbinom(n, prob = 0.5, size = 1), X1 = rnorm(n), X2 = rnorm(n))
df.test <- data.frame(Y = rbinom(n, prob = 0.5, size = 1), X1 = rnorm(n), X2 = rnorm(n))

## fit logistic model
e.null <- glm(Y~1, data = df.train, family = binomial(link="logit"))
e.logit1 <- glm(Y~X1, data = df.train, family = binomial(link="logit"))
e.logit2 <- glm(Y~X1+X2, data = df.train, family = binomial(link="logit"))

## assess performance on the training set (biased)
## and external dataset
performance(e.logit1, newdata = df.test)
e.perf <- performance(list(null = e.null, p1 = e.logit1, p2 = e.logit2),
                      newdata = df.test)
e.perf
summary(e.perf, order.model = c("null","p2","p1"))

## assess performance using cross validation
## Not run: 
set.seed(10)
performance(e.logit1, fold.repetition = 10, se = FALSE)
set.seed(10)
performance(list(null = e.null, prop = e.logit1), fold.repetition = 10)
performance(e.logit1, fold.repetition = c(50,20,10))

## End(Not run)

BuyseTest documentation built on March 31, 2023, 6:55 p.m.