Description Usage Arguments Value Note Author(s) References Examples
View source: R/select.stable.aic.R
Perform feature selection by GRRF and followed by stepwise model selection by AIC. Repeat it multiple times to select a stable set of features that are selected according to AIC.
1 | select.stable.aic(X.train, Y.train, coefReg, total=10)
|
X.train |
a data frame or matrix (like x) containing predictors for the training set. |
Y.train |
response for the training set. If a factor, classification is assumed, otherwise regression is assumed. If omitted, will run in unsupervised mode. |
coefReg |
regularization coefficient chosen for RRF, ranges between 0 and 1. |
total |
the number of times to repeat the process. |
a stable set of features selected by GRRF
For customized hyperparameter setting, can directly call RRF function from RRF package repeatly in a for loop.
Li Liu, Xin Guan
Guan, X., & Liu, L. (2018). Know-GRRF: Domain-Knowledge Informed Biomarker Discovery with Random Forests.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | ##---- Example: classification ----
set.seed(1)
X.train<-data.frame(matrix(rnorm(100*100), nrow=100))
b=seq(0.1, 2.2, 0.2)
##y has a linear relationship with first 10 variables
y.train=b[7]*X.train$X6+b[8]*X.train$X7+b[9]*X.train$X8+b[10]*X.train$X9+b[11]*X.train$X10
y.train=ifelse(y.train>0, 1, 0) ##classification
##use RRF to impute regularized coefficients
imp<-randomForest(X.train, as.factor(y.train))$importance
coefReg=0.5+0.5*imp/max(imp)
##select a stable set of feature that are selected by GRRF followed by stepAIC
select.stable.aic(X.train, as.factor(y.train), coefReg)
## The function is currently defined as
function (X.train, Y.train, coefReg, total=10)
{
selected <- c()
for (i in 1:total) {
temp=RRF(X.train, Y.train, coefReg=coefReg, flagReg=1, importance=T)$feaSet
selected <- c(selected, temp)
}
selected <- unique(selected)
df <- data.frame(Y.train, X.train[, selected])
colnames(df) <- c("resp", selected)
if (class(Y.train) == "factor") {
model.full <- glm(resp ~ ., data = df, family = binomial(link = "logit"))
model.step <- stepAIC(model.full, direction = "both",
trace = 0)
selected <- rownames(summary(model.step)$coef)[-1]
}
else {
model.full <- lm(resp ~ ., data = df)
model.step <- stepAIC(model.full, direction = "both",
trace = 0)
selected <- rownames(summary(model.step)$coef)[-1]
}
return(selected)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.