Description Usage Arguments Value Author(s) Examples
View source: R/feature_importance.R
Computes feature importance according to random forest, lasso and light gbm models and then calculates the mean imporatance. The provided data set will be downsampled by random stratified sampling to have a maximum of 60k observations if the dataset has more observations than 60k, training and validation sets are then created. Categortical features are converted to numeric by representing each category as a numeric number for simplicity purposes.A
1 2 3 | feature.importance(data, x = NULL, y, valid.split = 0.2,
max.class.levels = 100, cluster.shutdown = TRUE, seed = 1,
verbose = TRUE)
|
data |
[required | data.frame] Dataset containing predictor and target features |
x |
[optional | character | default=NULL] A vector of feature names present in the dataset used to predict the target feature. If NULL then all columns in the dataset is used. |
y |
[required | character] The name of the target feature contained in the dataset |
valid.split |
[optional | numeric | default=0.2] The percentage of data assigned to the validation partition |
max.class.levels |
[optional | numeric | default=100] The maximum number of unique values in the target feature before it is considered a regression problem. |
cluster.shutdown |
[optional | integer | default=TRUE] Shutdown h2o cluster after completion. |
seed |
[optional | integer | default=1] The random number seed for reproducable results |
verbose |
[optional | logical | default=TRUE] Toggles function to be chatty or not |
List containing a data.frame with feature importance, a feature importance plot and a cumulative feature importance plot
Xander Horn
1 | imp <- feature.importance(data = iris, x = names(iris)[1:4], y = "Species")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.