Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/RPCLR-functions.R
This function outputs variable importance scores based on the R-PCLR algorithm. This is applicable to settings of binary response (case versus control) and can be used to analyze high dimensional data arising from matched case control studies.
1 | GetVarImp(MyData, MyOut, MyStrat, mtry, numBS)
|
MyData |
a numeric data matrix of n (number of subjects) rows and p (number of features) columns |
MyOut |
a response vector of length n of binary indicators of case/control status |
MyStrat |
a vector of length n of matched pair (stratum) indicators |
mtry |
Number of covariates to be sampled randomly for inclusion in each model |
numBS |
Number of bootstrap replicates |
The function implements the R-PCLR algoritm. Details are found in the paper referenced below (Balasubramanian, R. et al., 2012). The algorithm utilizes a model-based approach that incorporates a penalized conditional likelihood, which allows adjustment for the matched design. The penalized conditional logistic regression model incorporates a ridge penalty and is implemented using the ridge() function within the survival library. The penalty parameter is set to the default option in the ridge() function. See Gray, R.J (1992) for details.
A p x 1 vector of variable importance scores.
Raji Balasubramanian
Balasubramanian, R., Houseman, E. A., Coull, B. A., Lev, M. H., Schwamm, L. H., Betensky, R. A. (2012). Variable importance in matched case-control studies in settings of high dimensional data, Submitted to Biostatistics.
Gray, R. J. (1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal of the American Statistical Association, 87, 942-51.
GenerateData, clogit, ridge
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ## Simulate Data of 100 matched pairs, 3 biomarkers, 5 noise features
set.seed(1234)
MyDat <- GenerateData(50, 3, 5, 0.5, 0.4)
Dat <- MyDat$Data
Out <- MyDat$Out
Strat <- MyDat$Strat
## Get Variable Importance
MyResults <- GetVarImp(Dat, Out, Strat, mtry=3, numBS=25)
## Print results
hist(MyResults, breaks=6, col="orange", xlab="Importance score", ylab="Number of features", main="Histogram of R-PCLR variable importance scores")
output <- cbind(as.character(colnames(Dat)), format(MyResults, digits=3))
print(output)
## Sort from most important (highest importance score) to least important feature (lowest importance score)
ind <- sort(MyResults, index.return=TRUE, decreasing=TRUE)$ix
output[ind,]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.