View source: R/feature_selection.R
feature_selection | R Documentation |
This function uses three different methods (glmnet, xgboost, ranger) in order to select important features.
feature_selection(
X,
y,
method = NULL,
params_glmnet = NULL,
params_xgboost = NULL,
params_ranger = NULL,
xgb_sort = NULL,
CV_folds = 5,
stratified_regr = FALSE,
scale_coefs_glmnet = FALSE,
cores_glmnet = NULL,
verbose = FALSE
)
X |
a sparse Matrix, a matrix or a data frame |
y |
a vector of length representing the response variable |
method |
one of 'glmnet-lasso', 'xgboost', 'ranger' |
params_glmnet |
a list of parameters for the glmnet model |
params_xgboost |
a list of parameters for the xgboost model |
params_ranger |
a list of parameters for the ranger model |
xgb_sort |
sort the xgboost features by "Gain", "Cover" or "Frequency" ( defaults to "Frequency") |
CV_folds |
a number specifying the number of folds for cross validation |
stratified_regr |
a boolean determining if the folds in regression should be stratified |
scale_coefs_glmnet |
if TRUE, less important coefficients will be smaller than the more important ones (ranking/plotting by magnitude possible) |
cores_glmnet |
an integer determining the number of cores to register in glmnet |
verbose |
outputs info |
This function returns the important features using one of the glmnet, xgboost or ranger algorithms. The glmnet algorithm can take either a sparse matrix, a matrix or a data frame and returns a data frame with non zero coefficients. The xgboost algorithm can take either a sparse matrix, a matrix or a data frame and returns the importance of the features in form of a data frame, furthermore it is possible to sort the features using one of the "Gain", "Cover" or "Frequency" methods. The ranger algorithm can take either a matrix or a data frame and returns the important features using one of the 'impurity' or 'permutation' methods.
a data frame with the most important features
Lampros Mouselimis
## Not run:
#...........
# regression
#...........
data(iris)
X = iris[, -5]
y = X[, 1]
X = X[, -1]
params_glmnet = list(alpha = 1,
family = 'gaussian',
nfolds = 3,
parallel = TRUE)
res = feature_selection(X,
y,
method = 'glmnet-lasso',
params_glmnet = params_glmnet,
CV_folds = 5,
cores_glmnet = 5)
#......................
# binary classification
#......................
y = iris[, 5]
y = as.character(y)
y[y == 'setosa'] = 'virginica'
X = iris[, -5]
params_ranger = list(write.forest = TRUE,
probability = TRUE,
num.threads = 6,
num.trees = 50,
verbose = FALSE,
classification = TRUE,
mtry = 2,
min.node.size = 5,
importance = 'impurity')
res = feature_selection(X,
y,
method = 'ranger',
params_ranger = params_ranger,
CV_folds = 5)
#..........................
# multiclass classification
#..........................
y = iris[, 5]
multiclass_xgboost = ifelse(y == 'setosa', 0, ifelse(y == 'virginica', 1, 2))
X = iris[, -5]
params_xgboost = list(params = list("objective" = "multi:softprob",
"bst:eta" = 0.35,
"subsample" = 0.65,
"num_class" = 3,
"max_depth" = 6,
"colsample_bytree" = 0.65,
"nthread" = 2),
nrounds = 50,
print.every.n = 50,
verbose = 0,
maximize = FALSE)
res = feature_selection(X,
multiclass_xgboost,
method = 'xgboost',
params_xgboost = params_xgboost,
CV_folds = 5)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.