cROC.kernel: Nonparametric kernel-based estimation of the...

View source: R/cROC.kernel.R

cROC.kernelR Documentation

Nonparametric kernel-based estimation of the covariate-specific ROC curve (cROC).

Description

This function estimates the covariate-specific ROC curve (cROC) using the nonparametric kernel-based method proposed by Rodriguez-Alvarez et al. (2011). The method, as it stands now, can only deal with one continuous covariate.

Usage

cROC.kernel(marker, covariate, group, tag.h, 
  bw = c("LS", "AIC"), regtype = c("LC", "LL"), 
  data, newdata, pauc = pauccontrol(),  
  p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95,
    parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)

Arguments

marker

A character string with the name of the diagnostic test variable.

covariate

A character string with the name of the continuous covariate.

group

A character string with the name of the variable that distinguishes healthy from diseased individuals.

tag.h

The value codifying healthy individuals in the variable group.

bw

A character string specifying which method to use to select the bandwidths. AIC specifies expected Kullback-Leibler cross-validation, and LS specifies least-squares cross-validation. Defaults to LS. For details see R-package np.

regtype

A character string specifying which type of kernel estimator to use for the regression function (see Details). LC specifies a local-constant estimator (Nadaraya-Watson) and LL specifies a local-linear estimator. Defaults to LC. For details see R-package np.

data

Data frame representing the data and containing all needed variables.

newdata

Optional data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if computed) will be computed. If not supplied, the function cROCData is used to build a default dataset.

pauc

A list of control values to replace the default values returned by the function pauccontrol. This argument is used to indicate whether the partial area under the covariate-adjusted ROC curve should be computed, and in case it is computed, , whether the focus should be placed on restricted false positive fractions (FPFs) or on restricted true positive fractions (TPFs), and the upper bound for the FPF (if focus is FPF) or the lower bound for the TPF (if focus is TPF).

p

Set of false positive fractions (FPF) at which to estimate the covariate-specific ROC curve. This set is also used to compute the area under the covariate-specific ROC curve using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation.

B

An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. The default is 1000.

ci.level

An integer value (between 0 and 1) specifying the confidence level. The default is 0.95.

parallel

A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow".

ncpus

An integer with the number of processes to be used in parallel operation. Defaults to 1.

cl

An object inheriting from class cluster (from the parallel package), specifying an optional parallel or snow cluster if parallel = "snow". If not supplied, a cluster on the local machine is created for the duration of the call.

Details

Estimates the covariate-specific ROC curve (cROC) defined as

ROC(p|x) = 1 - F_{D}\{F_{\bar{D}}^{-1}(1-p|x)|x\},

where

F_{D}(y|x) = Pr(Y_{D} \leq y | X_{D} = x ),

F_{\bar{D}}(y|x) = Pr(Y_{\bar{D}} \leq y | X_{\bar{D}} = x).

Note that, for the sake of clarity, we assume that the covariate of interest is the same in both healthy and diseased populations. In particular, the method implemented in this function estimates F_{D}(\cdot|x) and F_{\bar{D}}(\cdot|x) assuming a nonparametric location-scale regression model for Y in each population separately, i.e.,

Y_{D} = \mu_{D}(X_{D}) + \sigma_{D}(X_{D})\varepsilon_{D},

Y_{\bar{D}} = \mu_{\bar{D}}(X_{\bar{D}}) + \sigma_{\bar{D}}(X_{\bar{D}})\varepsilon_{\bar{D}},

where \mu_{D}(x) = E(Y_D | X_D = x), \mu_{\bar{D}}(x) = E(Y_{\bar{D}} | X_{\bar{D}} = x) (regression function), \sigma^2_{D}(x) = Var(Y_D | X_D = x), \sigma^2_{\bar{D}}(x) = Var(Y_{\bar{D}} | X_{\bar{D}} = x) (variance functions), and \varepsilon_{D} and \varepsilon_{\bar{D}} have zero mean, variance one, and distribution functions G_{D} and G_{\bar{D}}, respectively. In this case, the covariate-specific ROC curve can be expressed as

ROC(p|x) = 1 - G_{D}\{a(\mathbf{x}) + b(\mathbf{x})G_{\bar{D}}^{-1}(1-p)\},

where a(x) = \frac{\mu_{\bar{D}}(x) - \mu_{D}(x)}{\sigma_{D}(x)}, b(x) = \frac{\sigma_{\bar{D}}(x)}{\sigma_{D}(x)}, and G_{D} and G_{\bar{D}} are the distribution functions of \varepsilon_{D} and \varepsilon_{\bar{D}}, respectively. By default, for both the healthy and diseased population, both the regression and variance functions are estimated using the Nadaraya-Watson estimator (LC), and the bandwidth are selected using least-squares cross-validation (LS). Implementation relies on the R-package np. No assumptions are made about G_{D} and G_{\bar{D}}, which are empirically estimated on the basis of standardised residuals.

The covariate-specific area under the curve is

AUC(\mathbf{x})=\int_{0}^{1}ROC(p|\mathbf{x})dp,

and is computed numerically (using Simpson's rule). With regard to the partial area under the curve, when focus = "FPF" and assuming an upper bound u_1 for the FPF, what it is computed is

pAUC_{FPF}(u_1|\mathbf{x})=\int_0^{u_1} ROC(p|\mathbf{x})dp,

where again the integral is approximated numerically (Simpson's rule). The returned value is the normalised pAUC, pAUC_{FPF}(u_1|\mathbf{x})/u_1 so that it ranges from u_1/2 (useless test) to 1 (perfect marker). Conversely, when focus = "TPF", and assuming a lower bound for the TPF of u_2, the partial area corresponding to TPFs lying in the interval (u_2,1) is computed as

pAUC_{TPF}(u_2|\mathbf{x})=\int_{u_2}^{1}ROC_{TNF}(p|\mathbf{x})dp,

where ROC_{TNF}(p|\mathbf{x}) is a 270^\circ rotation of the ROC curve, and it can be expressed as ROC_{TNF}(p|\mathbf{x}) = F_{\bar{D}}\{F_{D}^{-1}(1-p|\mathbf{x})|\mathbf{x}\}=G_{\bar{D}}\{\frac{\mu_{D}(x)-\mu_{\bar{D}}(x)}{\sigma_{\bar{D}}(x)}+G_{D}^{-1}(1-p)\frac{\sigma_{D}(x)}{\sigma_{\bar{D}}(x)}\}. Again, the computation of the integral is done via Simpson's rule. The returned value is the normalised pAUC, pAUC_{TPF}(u_2|\mathbf{x})/(1-u_2), so that it ranges from (1-u_2)/2 (useless test) to 1 (perfect test).

Value

As a result, the function provides a list with the following components:

call

The matched call.

newdata

A data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if required) was computed.

data

The original supplied data argument.

missing.ind

A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur.

marker

The name of the diagnostic test variable in the dataframe.

group

The value of the argument group used in the call.

tag.h

The value of the argument tag.h used in the call.

covariate

The value of the argument covariate used in the call.

p

Set of false positive fractions (FPF) at which the covariate-specific ROC curve has been estimated.

ci.level

The value of the argument ci.level used in the call.

ROC

Estimated covariate-specific ROC curve (AROC), and ci.level*100% pointwise confidence band (if computed).

AUC

Estimated area under the covariate-specific ROC curve, and ci.level*100% confidence interval (if computed).

pAUC

If computed, estimated partial area under the covariate-adjusted ROC curve and ci.level*100% confidence interval (if computed). Note that the returned values are normalised, so that the maximum value is one.

fit

Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component of the list contains the following information: (1) bw.mean: An object of class npregbw with the selected bandwidth for the nonparametric regression function. For further details, see R-package np. (2) bw.var: An object of class npregbw with the selected bandwidth for the nonparametric variance function. For further details, see R-package np. (3) fit.mean: An object of class npreg with the nonparametric regression function estimate. For further details, see R-package np. (4) fit.var: An object of class npreg with the nonparametric variance function estimate. For further details, see R-package np.

References

Hayfield, T., and Racine, J. S.(2008). Nonparametric Econometrics: The np Package. Journal of Statistical Software 27(5). URL http://www.jstatsoft.org/v27/i05/.

Rodriguez-Alvarez, M. X., Roca-Pardinas, J., and Cadarso-Suarez, C. (2011). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21, 483–499.

See Also

AROC.bnp, AROC.sp, AROC.kernel, pooledROC.BB, pooledROC.emp, pooledROC.kernel, pooledROC.dpm, cROC.kernel or cROC.sp.

Examples

library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]

# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)

cROC_kernel <- cROC.kernel(marker = "l_marker1",
               covariate = "age",
               group = "status", 
               tag.h = 0,
               data = newpsa, 
               bw = "LS",
               regtype = "LC",
               p = seq(0, 1, len = 101),
               pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"),
               B = 500)

plot(cROC_kernel)

summary(cROC_kernel )



ROCnReg documentation built on March 31, 2023, 5:42 p.m.