R package cases: overview"
In cases: Stratified Evaluation of Subgroup Classification Accuracy

The goal of is this vignette is to illustrate the R package cases by some elementary code examples.

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%"
)

Preparation

Load the package:

library(cases)

Important functions

categorize()

Often, binary predictions are not readily available but rather need to be derived from continuous (risk) scores. This can be done via the categorize function.

# real data example from publication here
set.seed(123)
M <- as.data.frame(mvtnorm::rmvnorm(10, mean = rep(0, 3), sigma = 2 * diag(3)))
M

## categorize at 0 by default
yhat <- categorize(M)
yhat

## define multiple cutpoints to define multiple decision rules per marker
C <- c(0, 1, 0, 1, 0, 1)
a <- c(1, 1, 2, 2, 3, 3)
categorize(M, C, a)


## this can even be used to do multi-class classification, like this:
C <- matrix(rep(c(-1, 0, 1, -2, 0, 2), 3), ncol = 3, byrow = TRUE)
C
categorize(M, C, a)

compare()

In supervised classification, it is assumed that we have a true set of labels. In medical testing, this is usually called the reference standard provided by an established diagnostic/prognostic tool. We need to compare model predictions against these labels in order to compute model accuracy.

## consider binary prediction from 3 models from previous r chunk
names(yhat) <- paste0("rule", 1:ncol(yhat))
yhat

## assume true labels
y <- c(rep(1, 5), rep(0, 5))

## compare then results in
compare(yhat, y)

evaluate()

Main function of the package

evaluate(compare(yhat, y))

More details on the dta function are provided in the last section

draw_data()

cases includes a few functions for synthetic data generation

draw_data_lfc(n = 20)

draw_data_roc(n = 20)

Remark: Synthetic data comes at the 'compared' level meaning the labels 1 and 0 indicate correct and false predictions, respectively. No need to compare() in addition.

Common workflows

The pipe operator '%>%' allows us to chain together subsequent operations in R. This is useful, as the dta function expects preprocessed data indicating correct (1) and false (0) predictions.

M %>%
  categorize() %>%
  compare(y) %>%
  evaluate()

Multiple testing for co-primary endpoints

Specification of hypotheses

The R command

?evaluate

gives an overview over the function arguments of the evaluate function.

comparator defines one of the classification rules under consideration to be the primary comparator
benchmark is a pre-defined accuracy categorize for each subgroup

Together this implies the hypotheses system that is considered, namely

$H_0: \forall g \forall j: \theta_j^g \leq \theta_0^g$

In the application of primary interest, diagnostic accuracy studies, this simplifies to $G=2$ with $\theta_1 = Se$ and $\theta_2 =Sp$ indicating sensitivity and specificity of a medical test or classication rule. In this case we aim to reject the global null hypothesis

$H_0: \forall j: Se_j \leq Se_0 \wedge Sp_j \leq Sp_0$

Comparison vs. confidence regions

In the following, we highlight the difference between the "co-primary" analysis (comparison regions) and a "full" analysis (confidence regions).

set.seed(1337)

data <- draw_data_roc(
  n = 120, prev = c(0.25, 0.75), m = 4,
  delta = 0.05, e = 10, auc = seq(0.90, 0.95, 0.025), rho = c(0.25, 0.25)
)

lapply(data, head)

## comparison regions
results_comp <- data %>% evaluate(
  alternative = "greater",
  alpha = 0.025,
  benchmark = c(0.7, 0.8),
  analysis = "co-primary",
  regu = TRUE,
  adj = "maxt"
)
visualize(results_comp)

## confidence regions
results_conf <- data %>% evaluate(
  alternative = "greater",
  alpha = 0.025,
  benchmark = c(0.7, 0.8),
  analysis = "full",
  regu = TRUE,
  adj = "maxt"
)
visualize(results_conf)

As we can see, the comparison regions are more liberal compared to the confidence regions.

Real data example

A second vignette shows an application of the cases package to the Breast Cancer Wisconsin Diagnostic (wdbc) data set.

vignette("example_wdbc", "cases")

References

Westphal M, Zapf A. Statistical inference for diagnostic test accuracy studies with multiple comparisons. Statistical Methods in Medical Research. 2024;0(0). doi:10.1177/09622802241236933

Any scripts or data that you put into this service are public.

cases documentation built on April 3, 2025, 9:24 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cases
Stratified Evaluation of Subgroup Classification Accuracy

R package cases: overview"
In cases: Stratified Evaluation of Subgroup Classification Accuracy

Preparation

Important functions

categorize()

compare()

evaluate()

draw_data()

Common workflows

Multiple testing for co-primary endpoints

Specification of hypotheses

Comparison vs. confidence regions

Real data example

References

Try the cases package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

cases Stratified Evaluation of Subgroup Classification Accuracy

R package cases: overview" In cases: Stratified Evaluation of Subgroup Classification Accuracy

Preparation

Important functions

categorize()

compare()

evaluate()

draw_data()

Common workflows

Multiple testing for co-primary endpoints

Specification of hypotheses

Comparison vs. confidence regions

Real data example

References

Try the cases package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

cases
Stratified Evaluation of Subgroup Classification Accuracy

R package cases: overview"
In cases: Stratified Evaluation of Subgroup Classification Accuracy