View source: R/feature_selection.R
feature_selection | R Documentation |
This function uses three different methods (glmnet, xgboost, ranger) in order to select important features.
feature_selection( X, y, method = NULL, params_glmnet = NULL, params_xgboost = NULL, params_ranger = NULL, xgb_sort = NULL, CV_folds = 5, stratified_regr = FALSE, scale_coefs_glmnet = FALSE, cores_glmnet = NULL, verbose = FALSE )
X |
a sparse Matrix, a matrix or a data frame |
y |
a vector of length representing the response variable |
method |
one of 'glmnet-lasso', 'xgboost', 'ranger' |
params_glmnet |
a list of parameters for the glmnet model |
params_xgboost |
a list of parameters for the xgboost model |
params_ranger |
a list of parameters for the ranger model |
xgb_sort |
sort the xgboost features by "Gain", "Cover" or "Frequency" ( defaults to "Frequency") |
CV_folds |
a number specifying the number of folds for cross validation |
stratified_regr |
a boolean determining if the folds in regression should be stratified |
scale_coefs_glmnet |
if TRUE, less important coefficients will be smaller than the more important ones (ranking/plotting by magnitude possible) |
cores_glmnet |
an integer determining the number of cores to register in glmnet |
verbose |
outputs info |
This function returns the important features using one of the glmnet, xgboost or ranger algorithms. The glmnet algorithm can take either a sparse matrix, a matrix or a data frame and returns a data frame with non zero coefficients. The xgboost algorithm can take either a sparse matrix, a matrix or a data frame and returns the importance of the features in form of a data frame, furthermore it is possible to sort the features using one of the "Gain", "Cover" or "Frequency" methods. The ranger algorithm can take either a matrix or a data frame and returns the important features using one of the 'impurity' or 'permutation' methods.
a data frame with the most important features
Lampros Mouselimis
## Not run: #........... # regression #........... data(iris) X = iris[, -5] y = X[, 1] X = X[, -1] params_glmnet = list(alpha = 1, family = 'gaussian', nfolds = 3, parallel = TRUE) res = feature_selection(X, y, method = 'glmnet-lasso', params_glmnet = params_glmnet, CV_folds = 5, cores_glmnet = 5) #...................... # binary classification #...................... y = iris[, 5] y = as.character(y) y[y == 'setosa'] = 'virginica' X = iris[, -5] params_ranger = list(write.forest = TRUE, probability = TRUE, num.threads = 6, num.trees = 50, verbose = FALSE, classification = TRUE, mtry = 2, min.node.size = 5, importance = 'impurity') res = feature_selection(X, y, method = 'ranger', params_ranger = params_ranger, CV_folds = 5) #.......................... # multiclass classification #.......................... y = iris[, 5] multiclass_xgboost = ifelse(y == 'setosa', 0, ifelse(y == 'virginica', 1, 2)) X = iris[, -5] params_xgboost = list(params = list("objective" = "multi:softprob", "bst:eta" = 0.35, "subsample" = 0.65, "num_class" = 3, "max_depth" = 6, "colsample_bytree" = 0.65, "nthread" = 2), nrounds = 50, print.every.n = 50, verbose = 0, maximize = FALSE) res = feature_selection(X, multiclass_xgboost, method = 'xgboost', params_xgboost = params_xgboost, CV_folds = 5) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.