optAUC: Optimal Combinations of Diagnostic Tests Based on AUC
In optAUC: Optimal Combinations of Diagnostic Tests Based on AUC

Description Usage Arguments Details Value Note Author(s) References Examples

Searches for optimal linear combination of multiple diagnostic tests (markers) that maximizes the area under the receiver operating characteristic curve (AUC); performs an approximated cross-validation for estimating the AUC associated with the estimated coefficients.

1	optAUC(X, Y, column.select = c(1:ncol(X)), lambda = 5, scale = TRUE)

`X`	m X p data matrix for m non-diseased subjects with p markers
`Y`	n X p data matrix for n diseased subjects with p markers
`column.select`	which of the p markers are used for the combination, default is all p columns
`lambda`	the smooth parameter for the Sigmoid function used for the AUC
`scale`	a logic indicator whether performs standardization to the dataset before the combination, default is true

When several diagnostic tests are available, one can combine them to achieve better diagnostic accuracy. This program considers the optimal linear combination that maximizes the area under the receiver operating characteristic curve (AUC); the estimates of the combination's coefficients is obtained via a nonparametric procedure. Further, for estimating the AUC associated with the estimated coefficients, this progam outputs two estimates: one is an apparent estimation by re-substitution (ACV), which is too optimistic; the other is an approximated cross-validation (GCV) estimation. Notice that, the GCV can be applied for variable selection to select important diagnostic tests\markers. See reference for more details.

`beta`	the estimated linear coefficients, under a unit-sphere constraint
`ACV`	apparent estimation of AUC of the composite score by re-substitution of the linear coefficients
`GCV`	the approximated cross-validation estimation of AUC of the composite score
`converge`	an indicator for the convergency status of the optimization algorithm, 1 means converge, 0 means converge criteria not meet

It is recommended to rescale or monotonic transfer of the data first if significant outliners exists, e.g. log transfer. The AUC is invariant to any monotonic transformation of the data; however, the sigmoid approximation of the AUC may be affected by outliners.
The estimated linear coefficients are based on the standardized (if the parameter scale=TRUE) input data. Thus, composite scores = beta%*%scale(rbind(X,Y)).

Xin Huang, Gengsheng Qin, Yixin Fang
Maintainer: Xin Huang <xhuang.fhcrc@gmail.com>

Huang X, Qin G, Fang Y. (2011) Optimal Combinations of Diagnostic Tests Based on AUC. Biometrics. Jun;67(2):568-76.
http://www.ncbi.nlm.nih.gov/pubmed/20560934

library(MASS)
rho<-0
m<-50
n<-50
y1.sd<-0.5
y2.sd<-0.5 
y1.mean<-2
y2.mean<-1
lambda <- 5

set.seed(88)
# generate non-diseased population F(X1, X2)
# the sample from 2-dimensinal multinormal distribution with mean 0 and std=1
X1X2<-mvrnorm(m, c(1,1), matrix(c(0.5,rho,rho,0.5),2,2))

# generate  diseased population G(Y1,Y2)
# the sample from 2-dimensinal multinormal distribution with mean
# (y1.mean,y2.mean) and std=(y1.sd,y2.sd) 
Y1Y2<-mvrnorm(n, c(y1.mean,y2.mean), matrix(c(y1.sd^2,rho*y1.sd*y2.sd, rho*y1.sd*y2.sd, y2.sd^2),2,2))

# only the first marker, the "true" model, should have the maximum AUC amount all models
optAUC(X1X2, Y1Y2, column.select=1)
# two markers in the model, the AUC from GCV is smaller than just first marker in the model, because the second marker is noise
# the AUC from ACV (apearent estimate by substituting the estimated beta into the model) is larger than previous model, because overfitting
optAUC(X1X2, Y1Y2, column.select=c(1:2))

Loading required package: MASS
$ACV
[1] 0.8450307

$GCV
[1] 0.8445957

$beta
     [,1]
[1,]    1

$converge
[1] 1

$ACV
          [,1]
[1,] 0.8484635

$GCV
          [,1]
[1,] 0.8396014

$beta
[1]  0.9895103 -0.1444594

$converge
[1] 1

There were 50 or more warnings (use warnings() to see the first 50)