feature.importance: Random forest feature importance

Description Usage Arguments Value Author(s) Examples

View source: R/feature_importance.R

Description

Computes feature importance according to random forest, lasso and light gbm models and then calculates the mean imporatance. The provided data set will be downsampled by random stratified sampling to have a maximum of 60k observations if the dataset has more observations than 60k, training and validation sets are then created. Categortical features are converted to numeric by representing each category as a numeric number for simplicity purposes.A

Usage

1
2
3
feature.importance(data, x = NULL, y, valid.split = 0.2,
  max.class.levels = 100, cluster.shutdown = TRUE, seed = 1,
  verbose = TRUE)

Arguments

data

[required | data.frame] Dataset containing predictor and target features

x

[optional | character | default=NULL] A vector of feature names present in the dataset used to predict the target feature. If NULL then all columns in the dataset is used.

y

[required | character] The name of the target feature contained in the dataset

valid.split

[optional | numeric | default=0.2] The percentage of data assigned to the validation partition

max.class.levels

[optional | numeric | default=100] The maximum number of unique values in the target feature before it is considered a regression problem.

cluster.shutdown

[optional | integer | default=TRUE] Shutdown h2o cluster after completion.

seed

[optional | integer | default=1] The random number seed for reproducable results

verbose

[optional | logical | default=TRUE] Toggles function to be chatty or not

Value

List containing a data.frame with feature importance, a feature importance plot and a cumulative feature importance plot

Author(s)

Xander Horn

Examples

1
imp <- feature.importance(data = iris, x = names(iris)[1:4], y = "Species")

XanderHorn/lazy documentation built on Jan. 16, 2021, 6:15 p.m.