rocs: Receiver operating characteristics surface for a continuous...

View source: R/ROCsurface.R

ROCsurfaceR Documentation

Receiver operating characteristics surface for a continuous diagnostic test

Description

rocs.tcf is used to obtain bias-corrected estimates of the true class fractions (TCFs) for evaluating the accuracy of a continuous diagnostic test for a given cut point (c_1, c_2), with c_1 < c_2.

rocs provides bias-corrected estimates of the ROC surfaces of the continuous diagnostic test by using TCF.

Usage

rocs.tcf(
  method = "full",
  diag_test,
  dise_vec,
  veri_stat = NULL,
  rho_est = NULL,
  pi_est = NULL,
  cps
)

rocs(
  method = "full",
  diag_test,
  dise_vec,
  veri_stat,
  rho_est = NULL,
  pi_est = NULL,
  ncp = 100,
  plot = TRUE,
  ellipsoid = FALSE,
  cpst = NULL,
  ci_level = 0.95,
  surf_col = c("gray40", "green"),
  boot = FALSE,
  n_boot = 250,
  parallel = FALSE,
  ncpus = ifelse(parallel, detectCores()/2, NULL),
  ...
)

Arguments

method

a estimation method to be used for estimating the true class fractions in presence of verification bias. See 'Details'.

diag_test

a numeric vector containing the diagnostic test values. NA values are not allowed.

dise_vec

a n * 3 binary matrix with the three columns, corresponding to three classes of the disease status. In row i, 1 in column j indicates that the i-th subject belongs to class j, with j = 1, 2, 3. A row of NA values indicates a non-verified subject.

veri_stat

a binary vector containing the verification status (1 verified, 0 not verified).

rho_est

a result of a call to rho_mlogit of rho_knn to fit the disease model.

pi_est

a result of a call to psglm to fit the verification model.

cps

a cut point (c_1, c_2), with c_1 < c_2, which used to estimate TCFs. If m estimates of TCFs are required, cps must be matrix with m rows and 2 columns.

ncp

the dimension of cut point grid. It is used to determine the cut points (see 'Details'). Default 100.

plot

if TRUE(the default), a 3D plot of ROC surface is produced.

ellipsoid

a logical value. If TRUE, adds an ellipsoidal confidence region for TCFs at a specified cut point to current plot of ROC surface.

cpst

a specified cut point, which used to construct the ellipsoid confidence region. If m ellipsoid confidence regions are required, cpst must be matrix with m rows and 2 columns. Default NULL.

ci_level

an confidence level to be used for constructing the ellipsoid confidence region; default 0.95.

surf_col

color to be used for plotting ROC surface and ellipsoid.

boot

a logical value. Default = FALSE. If set to TRUE, a bootstrap resampling is employed to estimate the asymptotic variance-covariance matrix of TCFs at the cut point cpst. See more details in asy_cov_tcf.

n_boot

the number of bootstrap replicates, which is used for FULL estimator, or option boot = TRUE. Usually this will be a single positive integer. Default 250.

parallel

a logical value. If TRUE, a parallel computing is employed to the bootstrap resampling process.

ncpus

number of processes to be used in parallel computing. Default is half of of available cores.

...

optional arguments to be passed to plot3d, surface3d.

Details

In a three-class diagnostic problem, quantities used to evaluate the accuracy of a diagnostic test are the true class fractions (TCFs). For a given pair of cut points (c_1, c_2) such that c_1 < c_2, subjects are classified into class 1 (D_1) if T < c_1; class 2 (D_2) if c_1 \le T < c_2; class 3 (D_3) otherwise. The true class fractions of the test T at (c_1, c_2) are defined as

TCF_1(c_1) = P(T < c_1| D_1 = 1) = 1 - P(T \ge c_1| D_1 = 1),

TCF_2(c_1, c_2) = P(c_1 \le T < c_2| D_2 = 1) = P(T \ge c_1| D_2 = 1) - P(T \ge c_2| D_2 = 1),

TCF_3(c_2) = P(T > c_2| D_3 = 1) = P(T \ge c_2| D_3 = 1).

The receiver operating characteristic (ROC) surface is the plot of TCF_1, TCF_2 and TCF_3 by varying the cut point (c_1, c_2) in the domain of the diagnostic test. The cut points (c_1, c_2) are produced by designing a cut point grid with ncp dimension. In this grid, the points satisfying c_1 < c_2 are selected as the cut points. The number of the cut points are obtained as ncp(ncp - 1)/2, for example, the default is 4950.

These functions implement the bias-corrected estimators in To Duc et al (2016, 2020) for estimating TCF of a three-class continuous diagnostic test in presence of verification bias. The estimators work under MAR assumption. Five methods are provided, namely:

  • Full imputation (FI): uses the fitted values of the disease model to replace the true disease status (both of missing and non-missing values).

  • Mean score imputation (MSI): replaces only the missing values by the fitted values of the disease model.

  • Inverse probability weighted (IPW): weights each observation in the verification sample by the inverse of the sampling fraction (i.e. the probability that the subject was selected for verification).

  • Semiparametric efficient (SPE): replaces the true disease status by the double robust estimates.

  • K nearest-neighbor (KNN): uses K nearest-neighbor imputation to obtain the missing values of the true disease status.

The argument method must be selected from the collection of the bias-corrected methods, i.e., "full", "fi", "msi", "ipw", "spe" and "knn".

The ellipsoidal confidence region of TCFs at a given cut point can be constructed by using a normal approximation and plotted in the ROC surface space. The confidence level (default) is 0.95.

Note that, before using the functions rocs and rocs.tcf, the use of pre_data might be needed to check the monotone ordering disease classes and to create the matrix format for disease status.

Value

rocs returns a list, with the following components:

vals

the estimates of TCFs at all cut points.

cpoint

the cut points are used to construct the ROC surface.

ncp

dimension of the cut point grid.

cpst

the cut points are used to construct the ellipsoidal confidence regions.

tcf

the estimates of TCFs at the cut points cpst.

message

an integer code or vector. 1 indicates the ellipsoidal confidence region is available.

rocs.tcf returns a vector having estimates of TCFs at a cut point when cps is a vector with two elements, or a list of estimates of TCFs at m cut points when cps is a m*2 matrix. In addition, some attributes called theta, beta, cp and name are given. Here, theta is a probability vector, with 3 element, corresponding to the disease prevalence rates of three classes. beta is also a probability vector having 4 components, which are used to compute TCFs, see To Duc el al. (2016, 2020) for more details. cp is the specified cut point that is used to estimate TCFs. name indicates the method used to estimate TCFs. These attributes are required to compute the asymptotic variance-covariance matrix of TCFs at the given cut point.

References

To Duc, K., Chiogna, M. and Adimari, G. (2016) Bias-corrected methods for estimating the receiver operating characteristic surface of continuous diagnostic tests. Electronic Journal of Statistics, 10, 3063-3113.

To Duc, K., Chiogna, M. and Adimari, G. (2020) Nonparametric estimation of ROC surfaces in presence of verification bias. REVSTAT-Statistical Journal, 18, 5, 697–720.

See Also

psglm, rho_mlogit, plot3d.

Examples

data(EOC)
head(EOC)

## Not run: 
# FULL data estimator
dise_full <- pre_data(EOC$D.full, EOC$CA125)
dise_vec_full <- dise_full$dise_vec
if(requireNamespace("webshot2", quietly = TRUE)){
   rocs("full", diag_test = EOC$CA125, dise_vec = dise_vec_full, ncp = 30,
        ellipsoid = TRUE, cpst = c(-0.56, 2.31))
}

## End(Not run)

## Not run: 
# Preparing the missing disease status
dise_na <- pre_data(EOC$D, EOC$CA125)
dise_vec_na <- dise_na$dise_vec
dise_fact_na <- dise_na$dise

# FI estimator
rho_out <- rho_mlogit(dise_fact_na ~ CA125 + CA153 + Age, data = EOC,
                      test = TRUE)
if (requireNamespace("webshot2", quietly = TRUE)) {
   rocs("fi", diag_test = EOC$CA125, dise_vec = dise_vec_na,
        veri_stat = EOC$V, rho_est = rho_out, ncp = 30)
}

# Plot ROC surface and add ellipsoid confidence region
if (requireNamespace("webshot2", quietly = TRUE)) {
   rocs("fi", diag_test = EOC$CA125, dise_vec = dise_vec_na,
        veri_stat = EOC$V, rho_est = rho_out, ncp = 30,
        ellipsoid = TRUE, cpst = c(-0.56, 2.31))
}

# MSI estimator
if (requireNamespace("webshot2", quietly = TRUE)) {
   rocs("msi", diag_test = EOC$CA125, dise_vec = dise_vec_na,
        veri_stat = EOC$V, rho_est = rho_out, ncp = 30,
        ellipsoid = TRUE, cpst = c(-0.56, 2.31))
}

# IPW estimator
pi_out <- psglm(V ~ CA125 + CA153 + Age, data = EOC, test = TRUE)
if (requireNamespace("webshot2", quietly = TRUE)) {
   rocs("ipw", diag_test = EOC$CA125, dise_vec = dise_vec_na,
        veri_stat = EOC$V, pi_est = pi_out, ncp = 30,
        ellipsoid = TRUE, cpst = c(-0.56, 2.31))
}

# SPE estimator
if (requireNamespace("webshot2", quietly = TRUE)) {
   rocs("spe", diag_test = EOC$CA125, dise_vec = dise_vec_na,
        veri_stat = EOC$V, rho_est = rho_out, ncp = 30,
        pi_est = pi_out, ellipsoid = TRUE, cpst = c(-0.56, 2.31))
}

# NN estimator
x_mat <- cbind(EOC$CA125, EOC$CA153, EOC$Age)
k_opt <- cv_knn(x_mat = x_mat, dise_vec = dise_vec_na, veri_stat = EOC$V,
                type = "mahala", plot = TRUE)
rho_k_opt <- rho_knn(x_mat = x_mat, dise_vec = dise_vec_na,
                     veri_stat = EOC$V, k = k_opt, type = "mahala")
if (requireNamespace("webshot2", quietly = TRUE)) {
   rocs("knn", diag_test = EOC$CA125, dise_vec = dise_vec_na,
        veri_stat = EOC$V, rho_est = rho_k_opt, ncp = 30,
        ellipsoid = TRUE, cpst = c(-0.56, 2.31))
}

## Compute TCFs at three cut points
cutps <- rbind(c(0, 0.5), c(0, 1), c(0.5, 1))
rocs.tcf("spe", diag_test = EOC$CA125, dise_vec = dise_vec_na,
         veri_stat = EOC$V, rho_est = rho_out, ncp = 30,
         pi_est = pi_out, cps = cutps)

## End(Not run)


bcROCsurface documentation built on Sept. 9, 2023, 9:07 a.m.