select.stable.aic: Select a set of stable features based on AIC after an initial...

Description Usage Arguments Value Note Author(s) References Examples

View source: R/select.stable.aic.R

Description

Perform feature selection by GRRF and followed by stepwise model selection by AIC. Repeat it multiple times to select a stable set of features that are selected according to AIC.

Usage

1
select.stable.aic(X.train, Y.train, coefReg, total=10)

Arguments

X.train

a data frame or matrix (like x) containing predictors for the training set.

Y.train

response for the training set. If a factor, classification is assumed, otherwise regression is assumed. If omitted, will run in unsupervised mode.

coefReg

regularization coefficient chosen for RRF, ranges between 0 and 1.

total

the number of times to repeat the process.

Value

a stable set of features selected by GRRF

Note

For customized hyperparameter setting, can directly call RRF function from RRF package repeatly in a for loop.

Author(s)

Li Liu, Xin Guan

References

Guan, X., & Liu, L. (2018). Know-GRRF: Domain-Knowledge Informed Biomarker Discovery with Random Forests.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
##---- Example: classification  ----

set.seed(1)
X.train<-data.frame(matrix(rnorm(100*100), nrow=100))
b=seq(0.1, 2.2, 0.2) 
##y has a linear relationship with first 10 variables
y.train=b[7]*X.train$X6+b[8]*X.train$X7+b[9]*X.train$X8+b[10]*X.train$X9+b[11]*X.train$X10 
y.train=ifelse(y.train>0, 1, 0) ##classification

##use RRF to impute regularized coefficients
imp<-randomForest(X.train, as.factor(y.train))$importance 
coefReg=0.5+0.5*imp/max(imp) 

##select a stable set of feature that are selected by GRRF followed by stepAIC
select.stable.aic(X.train, as.factor(y.train), coefReg)

## The function is currently defined as
function (X.train, Y.train, coefReg, total=10) 
{
    selected <- c()
    for (i in 1:total) {
  	  temp=RRF(X.train, Y.train, coefReg=coefReg, flagReg=1, importance=T)$feaSet
  	  selected <- c(selected, temp)
    }
    selected <- unique(selected)
    df <- data.frame(Y.train, X.train[, selected])
    colnames(df) <- c("resp", selected)
    if (class(Y.train) == "factor") {
        model.full <- glm(resp ~ ., data = df, family = binomial(link = "logit"))
        model.step <- stepAIC(model.full, direction = "both", 
            trace = 0)
        selected <- rownames(summary(model.step)$coef)[-1]
    }
    else {
        model.full <- lm(resp ~ ., data = df)
        model.step <- stepAIC(model.full, direction = "both", 
            trace = 0)
        selected <- rownames(summary(model.step)$coef)[-1]
    }
    return(selected)
  }

guanxin1121/Know_GRRF documentation built on May 21, 2019, 11:10 a.m.