AROC.sp: Semiparametric frequentist inference for the...

View source: R/AROC.sp.R

AROC.spR Documentation

Semiparametric frequentist inference for the covariate-adjusted ROC curve (AROC).

Description

This function estimates the covariate-adjusted ROC curve (AROC) using the semiparametric approach proposed by Janes and Pepe (2009).

Usage

AROC.sp(formula.h, group, tag.h, data, 
    est.cdf.h = c("normal", "empirical"), pauc = pauccontrol(),
    p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95, 
  	parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)

Arguments

formula.h

A formula object specifying the location regression model to be fitted in the healthy population (see Details).

group

A character string with the name of the variable that distinguishes healthy from diseased individuals.

tag.h

The value codifying healthy individuals in the variable group.

data

A data frame representing the data and containing all needed variables.

est.cdf.h

A character string. It indicates how the conditional distribution function of the diagnostic test in the healthy population is estimated. Options are "normal" and "empirical" (see Details). The default is "normal".

pauc

A list of control values to replace the default values returned by the function pauccontrol. This argument is used to indicate whether the partial area under the covariate-adjusted ROC curve (pAAUC) should be computed, and in case it is computed, whether the focus should be placed on restricted false positive fractions (FPFs) or on restricted true positive fractions (TPFs), and the upper bound for the FPF (if focus is FPF) or the lower bound for the TPF (if focus is TPF).

p

Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve. This set is also used to compute the area under the covariate-adjusted ROC curve (AAUC) using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation.

B

An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. The default is 1000.

ci.level

An integer value (between 0 and 1) specifying the confidence level. The default is 0.95.

parallel

A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow".

ncpus

An integer with the number of processes to be used in parallel operation. Defaults to 1.

cl

An object inheriting from class cluster (from the parallel package), specifying an optional parallel or snow cluster if parallel = "snow". If not supplied, a cluster on the local machine is created for the duration of the call.

Details

Estimates the covariate-adjusted ROC curve (AROC) defined as

AROC\left(p\right) = Pr\{1 - F_{\bar{D}}(Y_D | \mathbf{X}_{D}) \leq p\},

F_{\bar{D}}(y|\mathbf{x}) = Pr\{Y_{\bar{D}} \leq y | \mathbf{X}_{\bar{D}} = \mathbf{x}\}.

The method implemented in this function estimates the outer probability empirically (see Janes and Pepe, 2009) and F_{\bar{D}}(\cdot|\mathbf{x}) is estimated assuming a semiparametric location regression model for Y_{\bar{D}}, i.e.,

Y_{\bar{D}} = \mathbf{X}_{\bar{D}}^{T}\mathbf{\beta}_{\bar{D}} + \sigma_{\bar{D}}\varepsilon_{\bar{D}},

where \varepsilon_{\bar{D}} has zero mean, variance one, and distribution function G_{\bar{D}}. As a consequence, we have

F_{\bar{D}}(y | \mathbf{x}) = G_{\bar{D}}\left(\frac{y-\mathbf{x}^{T}\mathbf{\beta}_{\bar{D}}}{\sigma_{\bar{D}}}\right).

In line with the assumptions made about the distribution of \varepsilon_{\bar{D}}, estimators will be referred to as: (a) "normal", where a standard Gaussian error is assumed, i.e., G_{\bar{D}}(y) = \Phi(y); and, (b) "empirical", where no assumption is made about the distribution (in this case, G_{\bar{D}} is empirically estimated on the basis of standardised residuals).

The area under the AROC curve is

AAUC=\int_0^1 AROC(p)dp,

and there exists a closed-form estimator. With regard to the partial area under the AROC curve, when focus = "FPF" and assuming an upper bound u_1 for the FPF, what it is computed is

pAAUC_{FPF}(u_1)=\int_0^{u_1} AROC(p)dp,

where again there exists a closed-form estimator. The returned value is the normalised pAAUC, pAAUC_{FPF}(u_1)/u_1 so that it ranges from u_1/2 (useless test) to 1 (perfect marker). Conversely, when focus = "TPF", and assuming a lower bound for the TPF of u_2, the partial area corresponding to TPFs lying in the interval (u_2,1) is computed as

pAAUC_{TPF}(u_2)=\int_{AROC^{-1}(u_2)}^{1}AROC(p)dp-\{1-AROC^{-1}(u_2)\}\times u_2.

Here, the computation of the integral is done numerically. The returned value is the normalised pAAUC, pAAUC_{TPF}(u_2)/(1-u_2), so that it ranges from (1-u_2)/2 (useless test) to 1 (perfect test).

Value

As a result, the function provides a list with the following components:

call

The matched call.

data

The original supplied data argument.

missing.ind

A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur.

marker

The name of the diagnostic test variable in the dataframe.

group

The value of the argument group used in the call.

tag.h

The value of the argument tag.h used in the call.

formula

The value of the argument formula.h used in the call.

est.cdf.h

The value of the argument est.cdf.h used in the call.

p

Set of false positive fractions (FPF) at which the covariate-adjusted ROC (AROC) curve has been estimated

ci.level

The value of the argument ci.level used in the call.

ROC

Estimated covariate-adjusted ROC curve (AROC), and ci.level*100% pointwise confidence bands (if computed)

AUC

Estimated area under the covariate-adjusted ROC curve (AAUC), and ci.level*100% confidence intervals (if required).

pAUC

If computed, estimated partial area under the covariate-adjusted ROC curve (pAAUC) and ci.level*100% confidence interval (if computed). Note that the returned values are normalised, so that the maximum value is one.

fit

Object of class lm with the fitted regression model in the healthy population.

coeff

Estimated regression coefficients (and ci.level*100% confidence interval if B greater than zero) from the fit of the linear model in the healthy population, as specified in formula.h.

References

Janes, H., and Pepe, M.S. (2009). Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve. Biometrika, 96(2), 371 - 382.

See Also

AROC.bnp, AROC.sp, AROC.kernel, pooledROC.BB, pooledROC.emp, pooledROC.kernel, pooledROC.dpm, cROC.bnp, cROC.sp or AROC.kernel.

Examples

library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]

# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)

m3 <- AROC.sp(formula.h = l_marker1 ~ age,
group = "status", 
tag.h = 0,
data = newpsa,
est.cdf.h = "normal",
pauc = pauccontrol(compute = TRUE, focus = "FPF", value = 0.5),
p = seq(0,1,l=101), 
B = 500)

summary(m3)

plot(m3)



ROCnReg documentation built on March 31, 2023, 5:42 p.m.