GetVarImp: To obtain variable importance scores using the R-PCLR...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/RPCLR-functions.R

Description

This function outputs variable importance scores based on the R-PCLR algorithm. This is applicable to settings of binary response (case versus control) and can be used to analyze high dimensional data arising from matched case control studies.

Usage

1
GetVarImp(MyData, MyOut, MyStrat, mtry, numBS)

Arguments

MyData

a numeric data matrix of n (number of subjects) rows and p (number of features) columns

MyOut

a response vector of length n of binary indicators of case/control status

MyStrat

a vector of length n of matched pair (stratum) indicators

mtry

Number of covariates to be sampled randomly for inclusion in each model

numBS

Number of bootstrap replicates

Details

The function implements the R-PCLR algoritm. Details are found in the paper referenced below (Balasubramanian, R. et al., 2012). The algorithm utilizes a model-based approach that incorporates a penalized conditional likelihood, which allows adjustment for the matched design. The penalized conditional logistic regression model incorporates a ridge penalty and is implemented using the ridge() function within the survival library. The penalty parameter is set to the default option in the ridge() function. See Gray, R.J (1992) for details.

Value

A p x 1 vector of variable importance scores.

Author(s)

Raji Balasubramanian

References

Balasubramanian, R., Houseman, E. A., Coull, B. A., Lev, M. H., Schwamm, L. H., Betensky, R. A. (2012). Variable importance in matched case-control studies in settings of high dimensional data, Submitted to Biostatistics.

Gray, R. J. (1992). Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal of the American Statistical Association, 87, 942-51.

See Also

GenerateData, clogit, ridge

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Simulate Data of 100 matched pairs, 3 biomarkers, 5 noise features 
set.seed(1234)
MyDat <- GenerateData(50, 3, 5, 0.5, 0.4)
Dat <- MyDat$Data
Out <- MyDat$Out
Strat <- MyDat$Strat

## Get Variable Importance
MyResults <- GetVarImp(Dat, Out, Strat, mtry=3, numBS=25)

## Print results
hist(MyResults, breaks=6, col="orange", xlab="Importance score", ylab="Number of features", main="Histogram of R-PCLR variable importance scores")
output <- cbind(as.character(colnames(Dat)), format(MyResults, digits=3))
print(output)

## Sort from most important (highest importance score) to least important feature (lowest importance score)
ind <- sort(MyResults, index.return=TRUE, decreasing=TRUE)$ix
output[ind,]

RPCLR documentation built on May 2, 2019, 11:26 a.m.