View source: R/xgb.importance.R
xgb.importance | R Documentation |
Creates a data.table
of feature importances in a model.
xgb.importance(
feature_names = NULL,
model = NULL,
trees = NULL,
data = NULL,
label = NULL,
target = NULL
)
feature_names |
character vector of feature names. If the model already
contains feature names, those would be used when |
model |
object of class |
trees |
(only for the gbtree booster) an integer vector of tree indices that should be included
into the importance calculation. If set to |
data |
deprecated. |
label |
deprecated. |
target |
deprecated. |
This function works for both linear and tree models.
For linear models, the importance is the absolute magnitude of linear coefficients. For that reason, in order to obtain a meaningful ranking by importance for a linear model, the features need to be on the same scale (which you also would want to do when using either L1 or L2 regularization).
For a tree model, a data.table
with the following columns:
Features
names of the features used in the model;
Gain
represents fractional contribution of each feature to the model based on
the total gain of this feature's splits. Higher percentage means a more important
predictive feature.
Cover
metric of the number of observation related to this feature;
Frequency
percentage representing the relative number of times
a feature have been used in trees.
A linear model's importance data.table
has the following columns:
Features
names of the features used in the model;
Weight
the linear coefficient of this feature;
Class
(only for multiclass models) class label.
If feature_names
is not provided and model
doesn't have feature_names
,
index of the features will be used instead. Because the index is extracted from the model dump
(based on C++ code), it starts at 0 (as in C/C++ or Python) instead of 1 (usual in R).
# binomial classification using gbtree:
data(agaricus.train, package='xgboost')
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 2,
eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
xgb.importance(model = bst)
# binomial classification using gblinear:
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, booster = "gblinear",
eta = 0.3, nthread = 1, nrounds = 20, objective = "binary:logistic")
xgb.importance(model = bst)
# multiclass classification using gbtree:
nclass <- 3
nrounds <- 10
mbst <- xgboost(data = as.matrix(iris[, -5]), label = as.numeric(iris$Species) - 1,
max_depth = 3, eta = 0.2, nthread = 2, nrounds = nrounds,
objective = "multi:softprob", num_class = nclass)
# all classes clumped together:
xgb.importance(model = mbst)
# inspect importances separately for each class:
xgb.importance(model = mbst, trees = seq(from=0, by=nclass, length.out=nrounds))
xgb.importance(model = mbst, trees = seq(from=1, by=nclass, length.out=nrounds))
xgb.importance(model = mbst, trees = seq(from=2, by=nclass, length.out=nrounds))
# multiclass classification using gblinear:
mbst <- xgboost(data = scale(as.matrix(iris[, -5])), label = as.numeric(iris$Species) - 1,
booster = "gblinear", eta = 0.2, nthread = 1, nrounds = 15,
objective = "multi:softprob", num_class = nclass)
xgb.importance(model = mbst)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.