Description Usage Arguments Details Value References Examples
View source: R/multiselect_Rpkg.R
When several predictors are available, there is often interest in combining a subset of predictors to diagnose disease or predict risk of a clinical outcome, D. In the context of an ordered outcome with K levels, where interest is in predicting D = K, there are multiple ways to select a combination. The traditional approach involves dichotomizing the outcome and using logistic regression to construct the combinations, then selecting a combination based on the estimated AUC for D = K vs. D < K for each fitted combination. An alternative approach, implemented here, constructs the combinations in the same way, but uses both the AUC for D = K vs. D < K and the AUC for D = K-1 vs. D < K-1. The combination with the best combined performance is then chosen. This function provides (i) the best combination defined solely by the AUC for D=K vs. D < K and (ii) the best combination defined by both the AUC for D = K vs. D < K and the AUC for D = K-1 vs. D < K-1. In the context where D indicates no, mild, or severe disease (K=3), this is equivalent to (i) selecting a combination in terms of its ability to discriminate between individuals with severe vs. no or mild disease and (ii) selecting a combination in terms of its ability to discriminate between individuals with severe vs. no or mild disease and its ability to discriminate between individuals with mild vs. no disease.
1 | multiselect(data, size=2, Breps=40, nummod=10)
|
data |
The name of the dataset to be used. An object of class ‘data.frame’ where the first column is the outcome, and the subsequent columns are the predictors. All columns must be numeric. The outcome must be take values 1,...,K, where K ≥q 3. Missing observations are not allowed. If the columns of |
size |
The size of the combinations. The function considers all possible subsets of the predictors of size |
Breps |
The number of bootstrap replicates used to estimate the optimism due to resubstitution bias in the AUCs. For each combination, the function estimates the apparent AUCs for each fitted combination. These apparent AUCs are then corrected by substracting the optimism due to resubstitution bias, which is estimated using a bootstrap procedure. Default 40. |
nummod |
The number of predictor combinations to return. Using the optimism-corrected estimate of the AUC for D=K vs. D < K, the function returns the top |
For each possible predictor combination of size size
, the function fits the predictor combination using logistic regression comparing outcome D = K to D < K. The apparent AUCs for (a) D = K vs. D < K and (b) D = K-1 vs. D < K-1 are calculated. A bootstrapping procedure is then used to estimate the optimism due to resubstitution bias in these apparent AUCs. The AUCs are corrected by subtracting the estimated optimism due to resubstitution bias. Two combinations are then selected: the combination with the highest AUC for D = K vs. D < K ("single AUC" approach) and the combination with the best sum of ranks for the AUC for D = K vs. D < K and the AUC for D = K-1 vs. D < K-1 ("multi-AUC" approach). The selected combinations may be the same for the two approaches. The top nummod
combinations, in terms of the AUC for D = K vs. D < K (corrected for optimism due to resubstitution bias), are also provided.
If more than one combination is "best" in terms of either the AUC for D = K vs. D < K or the sum of ranks for the AUC for D = K vs. D < K and the AUC for D = K-1 vs. D < K-1 (i.e., in the event of ties) the first combination is returned. The order of the combinations for p candidate predictors is given by combn(1:p, size)
. If ties occur for either (i) the AUC for D = K vs. D < K or (ii) the sum of ranks for the AUC for D = K vs. D < K and the AUC for D = K-1 vs. D < K-1, a warning is given.
A given bootstrap sample may not have observations from each of the K outcome levels; if this occurs, a warning is given and the estimated optimism for that bootstrap sample for both the AUC for D = K vs. D < K and the AUC for D = K-1 vs. D < K-1 will be NA. NAs are removed in the calculation of the mean optimism (used to correct the AUC estimates for resubstitution bias), and the total number of NAs across the Breps
(for either the AUC for D = K vs. D < K or the AUC for D = K-1 vs. D < K-1) is indicated by "numNA" in the output.
A list with the following components:
Best.Single |
The best predictor combination as chosen by the "single AUC" approach. The first |
Best.Multi |
The best predictor combination as chosen by the "multi-AUC" approach. The elements of |
Ranked.Rslts |
The results for the |
Meisner, A., Parikh, C.R., and Kerr, K.F. (2017). Using multilevel outcomes to construct and select biomarker combinations for single-level prediction. UW Biostatistics Working Paper Series, Working Paper 423.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
library(MASS)
## example takes ~1 minute to run
set.seed(15)
p = 16 ## number of predictors
matX <- matrix(rep(0.3,p*p), nrow=p, ncol=p) ## covariance matrix for the predictors
diag(matX) <- rep(1,p)
simD <- apply(rmultinom(400, 1, c(0.6,0.335,0.065)),2,which.max)
simDord <- simD[order(simD)]
numobs <- table(simDord)
simX1 <- mvrnorm(numobs[1], rep(0,p), 2*matX)
simX2 <- mvrnorm(numobs[2], c(1.5, 1, rep(0.5,(p-2)/2), rep(0.1,(p-2)/2)), 2*matX)
simX3 <- mvrnorm(numobs[3], c(rep(2,2), rep(0.8,(p-2)/2), rep(0.1,(p-2)/2)), 2*matX)
simX <- rbind(simX1, simX2, simX3)
exdata <- data.frame("D"=simDord, simX)
multiselect(data=exdata, size=2, Breps=20, nummod=10)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.