Description Usage Arguments Details Value Author(s) References Examples
This function allows the construction of a diagnostic or prognostic signature by using a logistic regression with lasso penalty. This function also performs estimations of the corresponding ROC curve according to different bootstrap-based approaches. Patients not included in the bootstrap sample are used to correct the overfitting.
1 2 | boot.ROC(status, features, N.boot,
precision, fold.cv, lambda1)
|
status |
A numeric vector with the indicators of the disease (e.g. 0=disease-free, 1=disease). |
features |
A matrix with the observed features. The number of raw is the number of individuals (equals to the length of the vector |
N.boot |
Number of bootstrap iterations. |
precision |
The quintiles of the predictor used for computing each point of the ROC curve. |
lambda1 |
The fixed values of the tuning parameters for L1 (lasso). If |
fold.cv |
The fold for cross-validation which is only used if |
This function does not deal with censored data. First, this function returns the results of the penalized logistic regression. By default, all the corresponding parameters (including the tuning parameter obtained by cross-validation which defined the number of variables selected in the linear predictor) are obtained from the total sample. The user can also define the value of the tuning parameter. Second, because the resulting scoring system may be associated with overfitting, internal validation is needed. At each iteration and based on each bootstrap sample, a logistic regression with lasso penalty is estimated. By default, the value of the tuning parameter is also determined by cross-validation on each bootstrap sample. Nevertheless, if lambda1
is defined by the user, the same value is used for all the iterations. The complete methodology is explained by Danger and Foucher (2012) in the context of incomplete data (right censoring). The application of this method is straightforward: the false positive/negative rates are simply obtained by the corresponding observed proportions in the function boot.ROC
.
The function returns a list. AUC
is a data frame. The raw(s) represent(s) the value(s) of the prognostic time. train
is the mean of the areas obtained by using the individuals included in the bootstrap samples (training). valid
is the mean of the areas obtained by using the individuals not included in the bootstrap samples (cross-validation). s632
is the mean of the areas obtained by using the simple 0.632 estimator. p632
is the mean of the areas obtained by using the 0.632+ estimator. ROC.Apparent
, ROC.CV
, ROC.632
and ROC.632p
are 4 data frames in which the false negative and positive rates are presented respectively for the 4 estimators: apparent, bootstrap and cross-validation, bootstrap 0.632 and bootstrap 0.632+. These rates correspond to the thresholds defined in cut.values
. Coef
is a vector of the regression coefficients obtained in the logistic model with lasso penalty obtained by using all subjects. The value of the tuning parameter is equals Lambda
. This model is contained in the object Model
. This object is obtained by using the function penalized() in the R package penalized. Please, look at the corresponding help for more details about the object Model
. Finally, the signature represents the prognostic score for each subject, i.e. the sum of the regression multiplied by the value of the features.
Y. Foucher <Yohann.Foucher@univ-nantes.fr>
R. Danger and Y. Foucher. Time dependent ROC curves for the estimation of true prognostic capacity of microarray data. Statistical Applications in Genetics and Molecular Biology. 2012 Nov 22;11(6):Article 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | # import and attach the data example
data(DLBCLpatients)
data(DLBCLgenes)
# In this exemple, we only reduce the number
# of features, threasholds and iterations for time-saving
DLBCLgenes <- DLBCLgenes[,1:500] # 500 first features
N.iterations <- 2
# If we define a priori the tuning parameter at 15.
res <- boot.ROC(status=DLBCLpatients$f,
features=DLBCLgenes, N.boot=N.iterations,
precision=seq(0.05, 0.95, by=0.30), lambda1=15)
# The distribution of the prognostic score
hist(res$Signature, nclass=30, main="",
xlab="Observed values of the multivariate signature")
# Illustrations of the ROC curve
plot(res$ROC.Apparent$FPR, 1-res$ROC.Apparent$FNR,
type="b", pch=1, lty=1, ylim=c(0,1), xlim=c(0,1),
ylab="True Positive Rates",
xlab="False Positive Rates")
lines(res$ROC.CV$FPR, 1-res$ROC.CV$FNR,
type="b", pch=2, lty=2)
lines(res$ROC.632$FPR, 1-res$ROC.632$FNR,
type="b", pch=3, lty=3)
lines(res$ROC.632p$FPR, 1-res$ROC.632p$FNR,
type="b", pch=4, lty=4)
legend("bottomright",
paste(c("Apparent", "CV", "0.632", "0.632+"),
"curve (AUC=", round(res$AUC,2), ")"), pch=1:4,
lty=1:4)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.